The major issue for whether you get that extra nn
dependency or not is whether there is propagation of dependencies across coordination (size
is a nn
of quality
and it is coordinated with picture
, therefore we make it an nn
of quality
too). The online output is showing the collapsed output with propagation, whereas you are calling the API method that doesn't include propagation. You can see either from the command-line using options as shown at the bottom of this post. In the API, to get coordination propagation, you should instead call
gs.typedDependenciesCCprocessed()
(instead of gs.typedDependenciesCollapsed()
).
Other comments:
- Where are the square brackets (
-LSB-
) coming from? They shouldn't be introduced by the tokenizer. If they are, it's a bug. Can you say what you do for them to be generated? I suspect they may be coming from your preprocessing? Unexpected things like that in a sentence will tend to cause the parse quality to degrade very badly. - The online parser isn't always up-to-date with the latest released version. I'm not sure if it is up-to-date right now. But I don't think that is the main issue here.
- We are doing some work evolving the dependencies representation. This is deliberate, but will create problems if you have code that depends substantively on how the dependencies were defined in an older version. We would be interested to know (perhaps by email to the
parser-user
list) if your accuracy was coming down for reasons other than your code was written to expect the dependency names as they were in an earlier version.
Example of difference using the command line:
[manning]$ cat > camera.txt
The size and picture quality of the camera is perfect.
[manning]$ java edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat typedDependencies -outputFormatOptions collapsedDependencies edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz camera.txt
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [2.4 sec].
Parsing file: camera.txt
Parsing [sent. 1 len. 11]: The size and picture quality of the camera is perfect .
det(quality-5, The-1)
nn(quality-5, size-2)
conj_and(size-2, picture-4)
nsubj(perfect-10, quality-5)
det(camera-8, the-7)
prep_of(quality-5, camera-8)
cop(perfect-10, is-9)
root(ROOT-0, perfect-10)
Parsed file: camera.txt [1 sentences].
Parsed 11 words in 1 sentences (6.94 wds/sec; 0.63 sents/sec).
[manning]$ java edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat typedDependencies -outputFormatOptions CCPropagatedDependencies edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz camera.txt
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [2.2 sec].
Parsing file: camera.txt
Parsing [sent. 1 len. 11]: The size and picture quality of the camera is perfect .
det(quality-5, The-1)
nn(quality-5, size-2)
conj_and(size-2, picture-4)
nn(quality-5, picture-4)
nsubj(perfect-10, quality-5)
det(camera-8, the-7)
prep_of(quality-5, camera-8)
cop(perfect-10, is-9)
root(ROOT-0, perfect-10)
Parsed file: camera.txt [1 sentences].
Parsed 11 words in 1 sentences (12.85 wds/sec; 1.17 sents/sec).