Get certain nodes out of a Parse Tree
-
06-06-2021 - |
Question
I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm.
At the moment, I don't understand how to:
Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?).
Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much for the Stanford Parser's Tree structure though. Is that jgrapht? If it is, could you provide me with some pointers at code snippets?
La solution
@dhg's answer works fine, but here are two other options that it might also be useful to know about:
The
Tree
class implementsIterable
. You can iterate through all the nodes of aTree
, or, strictly, the subtrees headed by each node, in a pre-order traversal, with:for (Tree subtree : t) { if (subtree.label().value().equals("PRP")) { pronouns.add(subtree); } }
You can also get just nodes that satisfy some (potentially quite complex pattern) by using
tregex
, which behaves rather likejava.util.regex
by allowing pattern matches over trees. You would have something like:TregexPattern tgrepPattern = TregexPattern.compile("PRP"); TregexMatcher m = tgrepPattern.matcher(t); while (m.find()) { Tree subtree = m.getMatch(); pronouns.add(subtree); }
Autres conseils
Here's a simple example that parses a sentence and finds all of the pronouns.
private static ArrayList<Tree> findPro(Tree t) {
ArrayList<Tree> pronouns = new ArrayList<Tree>();
if (t.label().value().equals("PRP"))
pronouns.add(t);
else
for (Tree child : t.children())
pronouns.addAll(findPro(child));
return pronouns;
}
public static void main(String[] args) {
LexicalizedParser parser = LexicalizedParser.loadModel();
Tree x = parser.apply("The dog walks and he barks .");
System.out.println(x);
ArrayList<Tree> pronouns = findPro(x);
System.out.println("All Pronouns: " + pronouns);
}
This prints:
(ROOT (S (S (NP (DT The) (NN dog)) (VP (VBZ walks))) (CC and) (S (NP (PRP he)) (VP (VBZ barks))) (. .)))
All Pronouns: [(PRP he)]