Question

I am working with RegEx Annotator in UIMA. I know that I can create regex variable in the XML Descriptor file like this:

<variables>
    <variable name="month" value="(Jan|Feb|March)" />
</variables>

and access it like this in the rule:

<rules>
    <rule regEx="Month: \v{month}" />
</rules>

which would match Month: Jan and Month: Feb and Month: Mar.

Now I want to use the variable inside another variable, is that possible? I am looking for something like this:

<variables>
    <variable name="monthmonth" value="\v{month}\v{month}" />
</variables>

which I want to create appropriate rule to match Month: JanJan for example.

I have read the documentation in http://uima.apache.org/downloads/sandbox/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html#sandbox.regexAnnotator.conceptsFile.regexVariables and it does not mention about the possibility of using regex variable inside variable, although it does say about The variables can be used in all concept definition within the same file.

I am using UIMA RegularExpressionAnnotator 2.3.1.

Any help is appreciated. =)

Was it helpful?

Solution

According to the docs,

The regex variable name can contain any of the following characters [a-zA-Z_0-9]. Other characters are not allowed.

If that's the only restriction, 123 would be a valid name, which you would refer to as value="\v{123}". How is the parser supposed to know you mean the rule named "123" and not one hundred and twenty-three vertical tab characters?

In most languages (including XML), names can't start with digits. If that were the case here, it would be possible to embed variable names in variable definitions, because \v{month} or \v{_123} is guaranteed to be invalid in a Java regex. (Other regex flavors are more liberal; they would interpret the braces as literal brace characters because {month} and {_123} can't be interpreted as range expressions.)

The only way to know for sure is to test it. It sounds like a nice feature to me; if it's not supported, maybe you should request it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top