Question

This is in regards to the question.

UIMA RUTA - how to do find & replace using regular expression and groups

I'm trying to setup Sofa mappings as suggested. I have an aggregate AE with several AEs and trying to incorporate 2 RUTA AEs/scripts within this pipeline. Both RUTA AEs (and associated scripts) are responsible for REGEXP find and replace using a Modifier. The 2nd AE is dependent on the output of the first AE. I had to configure the modifier's outputView of the 2nd AE, otherwise I was getting a 'Sofa data already set' exception.

In essence, I'm unable to weave the output of one as the input of the other AE.

The setup I have is similar to below,

_initialview --Input> (Normalizer1 RUTA AE) --Output> norm_1_out
norm_1_out --Input> (Normalizer2 RUTA AE) --Output> norm_2_out
norm_2_out --Input> (Other AE)

Here's the Aggregate AE code

<?xml version="1.0" encoding="UTF-8"?>

<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">
  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
  <primitive>false</primitive>
  <delegateAnalysisEngineSpecifiers>
    <delegateAnalysisEngine key="NormalizerPrepStep1">
      <import location="../../../ruta-annotators/desc/NormalizeNumbersEngine.xml"/>
    </delegateAnalysisEngine>

    <delegateAnalysisEngine key="NormalizerPrepStep2">
      <import location="../../../ruta-annotators/desc/NormalizeRangesEngine.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="Normalizer">
      <import location="../../../ruta-annotators/desc/NormalizerEngine.xml"/>
    </delegateAnalysisEngine>    
    <delegateAnalysisEngine key="SimpleAnnotator">
      <import location="../../../textanalyzer/desc/analysis_engine/SimpleAnnotator.xml"/>
    </delegateAnalysisEngine>
    </delegateAnalysisEngineSpecifiers>
  <analysisEngineMetaData>
    <name>RUTAAggregatePlaintextProcessor</name>
    <description>Runs the complete pipeline for annotating documents in plain text format.</description>
    <version/>
    <vendor/>
    <configurationParameters searchStrategy="language_fallback">
      <configurationParameter>
        <name>SegmentID</name>
        <description/>
        <type>String</type>
        <multiValued>false</multiValued>
        <mandatory>false</mandatory>
        <overrides>
          <parameter>SimpleAnnotator/SegmentID</parameter>
        </overrides>
      </configurationParameter>
    </configurationParameters>
    <configurationParameterSettings/>
    <flowConstraints>
      <fixedFlow>
        <node>NormalizerPrepStep1</node>
        <node>NormalizerPrepStep2</node>
        <node>Normalizer</node>
        <node>SimpleAnnotator</node>
      </fixedFlow>
    </flowConstraints>
    <typePriorities>
      <name>Ordering</name>
      <description>For subiterator</description>
      <version>1.0</version>
      <priorityList>
      </priorityList>
    </typePriorities>
    <fsIndexCollection/>
    <capabilities>
      <capability>
        <inputs/>
        <outputs/>
        <inputSofas>
          <sofaName>norm_1_out</sofaName>
        <sofaName>norm_2_out</sofaName>
          <sofaName>normalized</sofaName>
        </inputSofas>
        <languagesSupported/>
      </capability>
    </capabilities>
    <operationalProperties>
      <modifiesCas>true</modifiesCas>
      <multipleDeploymentAllowed>true</multipleDeploymentAllowed>
      <outputsNewCASes>false</outputsNewCASes>
    </operationalProperties>
  </analysisEngineMetaData>
  <resourceManagerConfiguration/>
<sofaMappings>
    <sofaMapping>
      <componentKey>SimpleAnnotator</componentKey>
      <aggregateSofaName>normalized</aggregateSofaName>
    </sofaMapping>
  <sofaMapping>
      <componentKey>NormalizerPrepStep2</componentKey>
      <aggregateSofaName>norm_1_out</aggregateSofaName>
    </sofaMapping>
    <sofaMapping>
      <componentKey>Normalizer</componentKey>
      <aggregateSofaName>norm_2_out</aggregateSofaName>
    </sofaMapping>
  </sofaMappings>
</analysisEngineDescription>

Few things to note,

  • all three RUTA AEs (step1, step2, normalizer) uses RUTA Modifier
  • the above setup throws an exception "No sofaFS with name norm_2_out found." - this happens after step 2.
  • I have tried to switch 'norm_2_out' to 'modified' as the input sofa to normalizer, this seems to move the processing to the next step in the pipeline (normalizer), but that throws an exception "Data for Sofa feature setLocalSofaData() has already been set." at org.apache.uima.ruta.engine.RutaModifier.process(RutaModifier.java:107)
  • I have tried with RUTA 2.2.0 (snapshot) with the same result

As I'm relatively new to both UIMA and RUTA, not sure if I'm doing something wrong or if there's a limitation that I'm running into.

BTW, I'm using RUTA 2.1.0

Thanks

Was it helpful?

Solution

The first thing that I noticed in your example is that you have to specify output sofas in your AAE. Those are all sofas that are created in the AAE, e.g, by one of its components. Then there are sofa mappings missing. You have to connect the output views of the AEs with the input views of the other AEs. In your example, I only see the default input views.

I created a unit test, which can be applied as an example for this task.

The test is here: https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/test/java/org/apache/uima/ruta/engine/CascadedModifierTest.java

The resources (descriptors) used in the test are here: https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/test/resources/org/apache/uima/ruta/engine

Mind that I deleted the absolute paths in the ruta descriptors and adapted the namespace of the imported scripts. They are now loaded by classpath for the test instead of using the absolute paths.

The test calls an aggregate analysis engine AAE.xml, which imports and maps five analysis engines:

  • CWEngine.xml: simple Ruta script that replaces capitalized words. CW{->REPLACE("CW")}; CW.ruta
  • ModiferCW.xml: a normal modifier
  • SWEngine.xml: simple Ruta script that replaces small-written words. SW{->REPLACE("SW")}; SW.ruta
  • ModiferSW.xml: a normal modifier
  • SimpleEngine.xml: simple Ruta script that defines a new type and matches on "CW" followed by "SW". DECLARE CwSw; ("CW" "SW"){-> CwSw}; Simple.ruta

The aggreagted analysis engines defines three views: global1 (input), global2 (output) and global3 (output). The sofa mapping of the components is the following:

global1 -> [CWEngine, ModiferCW] -> global2 -> [SWEngine, ModiferSW] -> global3-> [SimpleEngine]

Given the text Peter is tired. in the view global1, the aggregate analysis engine creates two new views with the view global3 containing the text CW SW SW. and one annotation of the type Simple.CwSw.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top