Pergunta

I'm trying to get my head around the Stanford CoreNLP API. I wish to get a simple sentence to be tokenized using following code:

    Properties props = new Properties();
    props.put("annotators", "tokenize");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = "I wish this code would run.";

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(CoreMap sentence: sentences) {
        // traversing the words in the current sentence
        // a CoreLabel is a CoreMap with additional token-specific methods
        for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
            // this is the text of the token
            String word = token.get(TextAnnotation.class);
            // this is the POS tag of the token
            String pos = token.get(PartOfSpeechAnnotation.class);
            // this is the NER label of the token
            String ne = token.get(NamedEntityTagAnnotation.class);       
        }

        // this is the parse tree of the current sentence
        Tree tree = sentence.get(TreeAnnotation.class);

        // this is the Stanford dependency graph of the current sentence
        SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    }

    // This is the coreference link graph
    // Each chain stores a set of mentions that link to each other,
    // along with a method for getting the most representative mention
    // Both sentence and token offsets start at 1!
    Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);

This is picked off from the Stanford NLP website itself, so I hoped it worked out of the box. Sadly it doesn't since it gives me a NullPointerException at:

for(CoreMap sentence: sentences) {...
Foi útil?

Solução

The code you have picked up from Stanford NLP website performs all the annotations on the text variable. In order to perform specific annotations you have to change the code accordingly.

To perform tokenization, this would be sufficient

Properties props = new Properties();
props.put("annotators", "tokenize");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

Annotation document = new Annotation(text);
pipeline.annotate(document);
for (CoreLabel token: document.get(TokensAnnotation.class)) {
    String word = token.get(TextAnnotation.class);
}

This line of code would return Null if annotators doesn't include Sentence Splitter("ssplit")

document.get(SentencesAnnotation.class);

And so you were encountering NullPointerException.

Outras dicas

This line retrieves sentence annotations.

List<CoreMap> sentences = document.get(SentencesAnnotation.class);

But your pipeline contains only the tokenizer, not the sentence splitter.

Change the following line:

    props.put("annotators", "tokenize, ssplit"); // add sentence splitter
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top