Get certain nodes out of a Parse Tree

https://stackoverflow.com/questions/10474827

06-06-2021
|

Question

I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm.

At the moment, I don't understand how to:

Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?).
Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much for the Stanford Parser's Tree structure though. Is that jgrapht? If it is, could you provide me with some pointers at code snippets?

La solution

@dhg's answer works fine, but here are two other options that it might also be useful to know about:

The Tree class implements Iterable. You can iterate through all the nodes of a Tree, or, strictly, the subtrees headed by each node, in a pre-order traversal, with:
```
for (Tree subtree : t) { 
    if (subtree.label().value().equals("PRP")) {
        pronouns.add(subtree);
    }
}
```
You can also get just nodes that satisfy some (potentially quite complex pattern) by using tregex, which behaves rather like java.util.regex by allowing pattern matches over trees. You would have something like:
```
TregexPattern tgrepPattern = TregexPattern.compile("PRP");
TregexMatcher m = tgrepPattern.matcher(t);
while (m.find()) {
    Tree subtree = m.getMatch();
    pronouns.add(subtree);
}
```

Autres conseils

Here's a simple example that parses a sentence and finds all of the pronouns.

private static ArrayList<Tree> findPro(Tree t) {
    ArrayList<Tree> pronouns = new ArrayList<Tree>();
    if (t.label().value().equals("PRP"))
        pronouns.add(t);
    else
        for (Tree child : t.children())
            pronouns.addAll(findPro(child));
    return pronouns;
}

public static void main(String[] args) {

    LexicalizedParser parser = LexicalizedParser.loadModel();
    Tree x = parser.apply("The dog walks and he barks .");
    System.out.println(x);
    ArrayList<Tree> pronouns = findPro(x);
    System.out.println("All Pronouns: " + pronouns);

}

This prints:

    (ROOT (S (S (NP (DT The) (NN dog)) (VP (VBZ walks))) (CC and) (S (NP (PRP he)) (VP (VBZ barks))) (. .)))
    All Pronouns: [(PRP he)]

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow