How to get POS tagging using Stanford Parser

https://stackoverflow.com/questions/3733587

03-10-2019
|

Question

I'm using Stanford Parser to parse the dependence relations between pair of words, but I also need the tagging of words. However, in the ParseDemo.java, the program only output the Tagging Tree. I need each word's tagging like this:

My/PRP$ dog/NN also/RB likes/VBZ eating/VBG bananas/NNS ./.

not like this:

(ROOT
  (S
    (NP (PRP$ My) (NN dog))
    (ADVP (RB also))
    (VP (VBZ likes)
      (S
        (VP (VBG eating)
          (S
            (ADJP (NNS bananas))))))
    (. .)))

Who can help me? thanks a lot.

Solution

If you're mainly interested in manipulating the tags in a program, and don't need the TreePrint functionality, you can just get the tagged words as a List:

LexicalizedParser lp =
  LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree parse = lp.apply(Arrays.asList(sent));
List taggedWords = parse.taggedYield();

OTHER TIPS

When running edu.stanford.nlp.parser.lexparser.LexicalizedParser on the command line, you want to use:

-outputFormat "wordsAndTags"

Programatically, use the TreePrint class constructed with formatString="wordsAndTags" and call printTree, like this:

TreePrint posPrinter = new TreePrint("wordsAndTags", yourPrintWriter);
posPrinter.printTree(yourLexParser.getBestParse());

String[] sent = { "This", "is", "an", "easy", "sentence", "." };
List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
Tree parse = lp.apply(rawWords);
ArrayList ar=parse.taggedYield();
System.out.println(ar.toString());

This answer is a bit outdated so I decided to add my own. So with Stanford Parser version 3.6.0 (maven dependencies):

    <dependency>
       <groupId>edu.stanford.nlp</groupId>
       <artifactId>stanford-parser</artifactId>
       <version>3.6.0</version>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.6.0</version>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.6.0</version>
        <classifier>models</classifier>
    </dependency>

      private static MaxentTagger tagger = new MaxentTagger(MaxentTagger.DEFAULT_JAR_PATH);
      public String getTaggedString(String someString) {

            String taggedString = tagger.tagString(someString);
            return taggedString;
      }

This will return I_PRP claim_VBP the_DT rights_NNS for 'I claim the rights'

So If you want to detect verbs in a phrase using java and stanford parser you can do this:

public boolean containsVerb(String someString) {
        String taggedString = tagger.tagString(someString);
        String[] tokens = taggedString.split(" ");
        for (String tok : tokens){
            String[] taggedTokens = tok.split("_");
            if (taggedTokens[1].startsWith("VB")){
                return true;
            }

        }
        return false;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow