Question

Is there a way in UIMA to access the annotations from the tokens like the same way they do in their CAS debugger GUI?. You can of course access all the annotations from the index repository, but i want to loop on the tokens, and get all associated annotations to every token.

The reason for that is simply, I want to want to check some annotations and discard the others and in such way it is much easier. Any help is appreciated :)

Was it helpful?

Solution 2

After searching and asking the developers of cTAKES( Apache clinical Text Analysis and Knowledge Extraction System ). you can use the following library "uimafit" which can be found on http://code.google.com/p/uimafit/ . The following code can be used

List list = JCasUtil.selectCovered(jcas, <T extends Annotation>, startIndex, endIndex);

This will return all the between the 2 indices.

Hope that will help

OTHER TIPS

I'm a uimaFIT developer.

If you want to find all annotations within the boundaries of another annotation, you may prefer the shorter and faster variant

JCasUtil.selectCovered(referenceAnnotation, <T extends ANNOTATION>);

Mind that it is not a good idea creating a "dummy" annotation with the desired offsets and then search within its boundaries, because this immediately allocates memory in the CAS which and is not garbage-collected unless the complete CAS is collected.

if you don't want to use uimaFIT, you can create a filtered iterator to loop through annotations of interest. The UIMA reference documentation is here: UIMA reference documentation

I recently used this approach in some code to find a sentence annotation which encompassed a regex annotation (this approach was acceptable for our project because all regular expression matches were shorter than the sentences in the document, and there was only one regex match per sentence. Obviously, based on indexing rules, your mileage may vary. If you are afraid of running into another shorterAnnotationType, put the inner code into a while loop):

static ArrayList<annotationsPair> process(Annotation shorterAnnotationType, 
        Annotation longerAnnotationType, JCas aJCas){

    ArrayList<annotationsPair> annotationsList = new ArrayList<annotationsPair>();

    FSIterator it = aJCas.getAnnotationIndex().iterator();
    FSTypeConstraint constraint = aJCas.getConstraintFactory().createTypeConstraint();
    constraint.add(shorterAnnotationType.getType());
    constraint.add(longerAnnotationType.getType());
    it = aJCas.createFilteredIterator(it, constraint);

    Annotation a = null;
    int shorterBegin = -1;
    int shorterEnd = -1;
    it.moveTo((shorterAnnotationType));
    while (it.isValid()) {
        a = (Annotation) it.get();
        if (a.getClass() == shorterAnnotationType.getClass()){
            shorterBegin = a.getBegin();
            shorterEnd = a.getEnd();
            System.out.println("Target annotation from " + shorterBegin 
                    + " to " + shorterEnd);
            //because assume that sentence type is longer than other type, 
            //the sentence gets indexed prior
            it.moveToPrevious(); 
            if(it.isValid()){
                Annotation prevAnnotation = (Annotation) it.get();
                if (prevAnnotation.getClass() == longerAnnotationType.getClass()){
                    int sentBegin = prevAnnotation.getBegin();
                    int sentEnd = prevAnnotation.getEnd();
                    System.out.println("found annotation [" + prevAnnotation.getCoveredText()
                            + "] location: " + sentBegin + ", " + sentEnd);
                    annotationsPair pair = new annotationsPair(a, prevAnnotation);
                    annotationsList.add(pair);
                }
                //return to where you started
                it.moveToNext(); //will not invalidate iter because just came from next
            }
        }
        it.moveToNext();
    }

    return annotationsList;

}

Hope this helps! Disclaimer: I am new to UIMA.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top