Question

I am looking for something like this:

WORDLIST lemmas = 'lemmas.txt';
DECLARE Test;    
BLOCK(AnnotateTests) Token{} {
    STRING lemma;
    Token{->GETFEATURE("lemma", lemma)};
    INLIST(lemma, lemmas) -> MARK(Action); // <- How to do this?
}

I know this is broken code, but I would like to know how I can supply a list of terms by a text file and annotate all instances of, say, Token, who have a certain feature (Lemma in the example) value among the ones in the list. I know String equality is possible, but list membership I was not able to find in the documentation or figure out myself.

Thanks!

Was it helpful?

Solution

UIMA Ruta 2.1.0: Unfortunately, the INLIST condition does not accept additional arguments, but only checks on the covered text of the matched annotation. So you cannot use that. The CONTAINS condition accepts an additional argument, but not word lists. You can also not apply the wordlist with MARKFAST since the dictionary check is token-based.

The best solution for this problem is to ask the developers to add the functionality, or adding an external condition that provides the functionality.

In UIMA Ruta 2.1.0, you could use StringListExpressions instead of word lists:

STRINGLIST LemmaSL = {"cat", "dog"}; // the content of the wordlist
Token{CONTAINS(LemmaSL, Token.lemma) -> MARK(Action)};

In UIMA Ruta 2.2.0, the INLIST condition is able to process an additional argument that replaces the covered text of the matched annotation, which should solve your problem:

WORDLIST LemmaList = 'lemmas.txt';
Token{INLIST(LemmaList, Token.lemma) -> MARK(Action)};

DISCLAIMER: I am a developer of Apache UIMA Ruta.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top