GATE API and JAPE code, return empty result

https://stackoverflow.com/questions/23285082

09-07-2023
|

Question

I used GATE API with java code and tried to run one of the known JAPE rules on text of document but unfortunatly I could not get the appropriate results. My code as following:

public void initAnnie() throws GateException, IOException {
    Out.prln("Initialising ANNIE...");

    // load the ANNIE application from the saved state in plugins/ANNIE
    File pluginsHome = Gate.getPluginsHome();
    File anniePlugin = new File(pluginsHome, "ANNIE");
    File annieGapp = new File(anniePlugin, "ANNIE_with_defaults.gapp");
    annieController = (CorpusController) PersistenceManager
            .loadObjectFromFile(annieGapp);

    Out.prln("...ANNIE loaded");
} // initAnnie()

/** Tell ANNIE's controller about the corpus you want to run on */
public void setCorpus(Corpus corpus) {
    annieController.setCorpus(corpus);
} // setCorpus

/** Run ANNIE */
public void execute() throws GateException {
    Out.prln("Running ANNIE...");
    annieController.execute();
    Out.prln("...ANNIE complete");
} // execute()

/**
 * Run from the command-line, with a list of URLs as argument.
 * <P>
 * <B>NOTE:</B><BR>
 * This code will run with all the documents in memory - if you want to
 * unload each from memory after use, add code to store the corpus in a
 * DataStore.
 */
public static void main(String args[]) throws GateException, IOException {
// initialise the GATE library
Out.prln("Initialising GATE...");
Gate.init();
Out.prln("...GATE initialised");
// load ANNIE plugin - you must do this before you can create tokeniser
// or JAPE transducer resources.
Gate.getCreoleRegister().registerDirectories(
new File(Gate.getPluginsHome(), "ANNIE").toURI().toURL());

 // Build the pipeline
  SerialAnalyserController pipeline =
 (SerialAnalyserController)Factory.createResource(
   "gate.creole.SerialAnalyserController");
  LanguageAnalyser tokeniser = (LanguageAnalyser)Factory.createResource(
  "gate.creole.tokeniser.DefaultTokeniser");
LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
 "gate.creole.Transducer", gate.Utils.featureMap(
     "grammarURL", new     
 File("C:path/to/univerity_rules.jape").toURI().toURL(),
   "encoding", "UTF-8")); // ensure this matches the file
pipeline.add(tokeniser);
pipeline.add(jape);

// create document and corpus
// create a GATE corpus and add a document for each command-line
// argument
Corpus corpus = Factory.newCorpus("JAPE corpus");

 URL u = new URL("file:/path/to/Document.txt");
 FeatureMap params = Factory.newFeatureMap();
 params.put("sourceUrl", u);
 params.put("preserveOriginalContent", new Boolean(true));
 params.put("collectRepositioningInfo", new Boolean(true));
 Out.prln("Creating doc for " + u);
 Document doc = (Document)
   Factory.createResource("gate.corpora.DocumentImpl", params);
 corpus.add(doc);
 pipeline.setCorpus(corpus);

// run it
pipeline.execute();

// extract results
System.out.println("Found annotations of the following types: " +
  doc.getAnnotations().getAllTypes());


} // main

 }

and the JAPE rule used as follow:

Phase:firstpass 
Input: Lookup Token 

//note that we are using Lookup and Token both inside our rules. 
Options: control = appelt


Rule: University1 
Priority: 20
(
  {Token.string == "University"} 
  {Token.string == "of"}
  {Lookup.minorType == city} 
):orgName 
-->
:orgName.Organisation = 
  {kind = "university", rule = "University1"}

and finally the result that I got as follow:

 Initialising GATE...
 log4j:WARN No appenders could be found for logger (gate.Gate).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
  ...GATE initialised
 Creating doc for file:path/to/Document.txt
 Found annotations of the following types: [SpaceToken, Token]

Please any help

Solution

The problem is that you don't have "Lookup" annotations you are trying to use in your JAPE program.

You need to add 2 additional resources:

    LanguageAnalyser gazetter = (LanguageAnalyser)Factory.createResource(
            "gate.creole.gazetteer.DefaultGazetteer");
    LanguageAnalyser splitter = (LanguageAnalyser)Factory.createResource(
            "gate.creole.splitter.SentenceSplitter");

Your processing resources should run in following order:

    pipeline.add(tokeniser);
    pipeline.add(gazetter);
    pipeline.add(splitter); 
    pipeline.add(jape);

Gazetterr will create "Lookup" annotations.

Sentence splitter will stop creating "Organisation" annotations that span over two sentences.

that was tested, and it works for me.

...GATE initialised
Creating doc for file:/Users/andreyshafirin/tmp/testdoc.txt
Found annotations of the following types: [Lookup, Organisation, Token, Split, SpaceToken, Sentence]

PS:

I think there is a better approach to work with GATE from Java code. You can create application in GATE Developer, customize it and save it to file (here you will find how). Then you can load GATE application from your java code (see this example for you, and more other examples here to get idea how). This way you don't have to worry about bunch of details and features related to properties of processing resources (you will define and change them in GUI).

Good luck with GATE.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow