Question

How can we group all annotations between two annotations?

I'm new to GATE and am trying to group annotations together , Not sure if we can do this , Please help. For Example In the following text :

Page-1
Age:53 
Person: Nathan

Page-2
Treatment : Initial Evaluation
History: Yes

Page-3
..........

If my Gazetteer list consists of different tags, page tag for each page number, age, person, Treatment, History etc. I want to group all tags from Page-1 to Page-2 under Page-1 Annotation and all tags between Page-2 and Page-3 under Page-2.

Please let me know if more information required on this question.

Thanks in advance.

Était-ce utile?

La solution

I'm not entirely sure what you mean by "group together" but you can certainly create annotations that span across the content of each "page". Assuming you have a PageNumber annotation on each "Page-1", "Page-2" etc. then you can use something like this to create annotations spanning from one PageNumber to the next. I'm using a control = once JAPE to do this, you could equivalently use a Groovy script or a custom PR

Imports: { import static gate.Utils.*; }
Phase: PageSpans
Input: PageNumber
Options: control = once

Rule: PageSpan
({PageNumber})
-->
{
  try {
    List<Annotation> numbers = inDocumentOrder(inputAS.get("PageNumber"));
    for(int i = 0; i < numbers.size(); i++) {
      outputAS.add(start(numbers.get(i)), // from start of this PageNumber, to...
                   (i+1 < numbers.size()
                     ? start(numbers.get(i+1)) // start of the next number, or...
                     : end(doc) // ...if no more PageNumbers then end of document
                   ),
                   "Page",
                   // store the text under the PageNumber as a feature of Page
                   featureMap("id", stringFor(doc, numbers.get(i))));
    }
  } catch(InvalidOffsetException e) {
    throw new JapeException("Invalid offset from existing annotation", e);
  }
}

In your comment you ask about moving all the annotations under each "page" into a separate annotation set. This would be relatively straightforward once you have done the above, and if you have the page number as a feature on your Page annotations as I have done with the "id" feature. Then you could define another JAPE that does something like this:

Imports: { import static gate.Utils.*; }
Phase: SetPerPage
Input: Age X Y // and whatever other annotation types you want to copy
Options: control = all

Rule: MoveToPageSet
({Age}|{X}|{Y}):entity
-->
:entity {
  try {
    for(Annotation e : entityAnnots) {
      // find the (only) Page annotation that covers this entity
      Annotation thePage = getOnlyAnn(getCoveringAnnotations(inputAS, e, "Page"));
      // get the corresponding annotation set
      AnnotationSet pageSet = doc.getAnnotations(
              (String)thePage.getFeatures().get("id"));
      // and copy the annotation into it
      pageSet.add(start(e), end(e), e.getType(), e.getFeatures());
    }
  } catch(InvalidOffsetException e) {
    throw new JapeException("Invalid offset from existing annotation", e);
  }
  // optionally remove from input set
  // inputAS.removeAll(entityAnnots);
}
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top