質問

I am working with stanford parser to extract grammetical dependency structures from review sentences. My problem is that for some reason the output generated by my code is not similar to the one generated my stanford online tool. Below is an example.

Review Sentence: The picture quality of the camera is not good.

My Code output (It used EnglishPCFG model and typedDependenciesCollapsed structure)

root(ROOT-0, -LSB--1), 
det(quality-4, The-2), 
nn(quality-4, picture-3),
nsubj(-RSB--11, quality-4), 
det(camera-7, the-6), 
prep_of(quality-4, camera-7), 
cop(-RSB--11, is-8), 
neg(-RSB--11, not-9), 
amod(-RSB--11, good-10), 
ccomp(-LSB--1, -RSB--11)

Stanford Online tool Output:

det(quality-3, The-1)
nn(quality-3, picture-2)
nsubj(good-9, quality-3)
det(camera-6, the-5)
prep_of(quality-3, camera-6)
cop(good-9, is-7)
neg(good-9, not-8)
root(ROOT-0, good-9)

I am looking for the reason for this difference. What kind of model and dependency structure does online parser use ? I apologies if I am missing something obvious. Any help would be highly appreciated.

I can add code snippet if required

Update:

I changed my code to ignore the LSB and RSB generated by the SP tokenizer but still the grammatical structure generated is different from that of online tool. Here is an example:

Review Sentence: The size and picture quality of the camera is perfect.

My Code Output:

det(quality-5, The-1), 
nn(quality-5, size-2), 
conj_and(size-2, picture-4),
nsubj(perfect-10, quality-5), 
det(camera-8, the-7), 
prep_of(quality-5, camera-8), 
cop(perfect-10, is-9), 
root(ROOT-0, perfect-10)

Stanford Online Tool Output:

det(quality-5, The-1)
nn(quality-5, size-2)
conj_and(size-2, picture-4)
**nn(quality-5, picture-4)**
nsubj(perfect-10, quality-5)
det(camera-8, the-7)
prep_of(quality-5, camera-8)
cop(perfect-10, is-9)
root(ROOT-0, perfect-10)

Note the missing nn dependency in my code output. I am trying to get my head around why this is happening. Any help would be appreciated.

Update (Relevant code snippet below):

rawWords2 = [-LSB-, The, size, and, picture, quality, of, the, camera, is, perfect, -RSB-]

lp = LexicalizedParser using EnglishPCFG model

Tree parse = lp.apply(rawWords2.subList(1,rawWords2.size() - 1));

TreebankLanguagePack tlp = new PennTreebankLanguagePack();

GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();

GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);

tdl = (List<TypedDependency>) gs.typedDependenciesCollapsed();

System.out.println(tdl.toString());

Output to screen is as mentioned earlier in the post.

Another observation.

I worked around with Stanford library to show me the dependency relation between quality and picture which as shown in the Stanford online tool is nn but the dependency shown by the library is dep (i.e. can't find more suitable dependency). Now the question is why is Stanford online tool showing nn dependency between quality and picturewhere as Stanford library showing dep as dependency.

役に立ちましたか?

解決

The major issue for whether you get that extra nn dependency or not is whether there is propagation of dependencies across coordination (size is a nn of quality and it is coordinated with picture, therefore we make it an nn of quality too). The online output is showing the collapsed output with propagation, whereas you are calling the API method that doesn't include propagation. You can see either from the command-line using options as shown at the bottom of this post. In the API, to get coordination propagation, you should instead call

gs.typedDependenciesCCprocessed()

(instead of gs.typedDependenciesCollapsed()).

Other comments:

  • Where are the square brackets (-LSB-) coming from? They shouldn't be introduced by the tokenizer. If they are, it's a bug. Can you say what you do for them to be generated? I suspect they may be coming from your preprocessing? Unexpected things like that in a sentence will tend to cause the parse quality to degrade very badly.
  • The online parser isn't always up-to-date with the latest released version. I'm not sure if it is up-to-date right now. But I don't think that is the main issue here.
  • We are doing some work evolving the dependencies representation. This is deliberate, but will create problems if you have code that depends substantively on how the dependencies were defined in an older version. We would be interested to know (perhaps by email to the parser-user list) if your accuracy was coming down for reasons other than your code was written to expect the dependency names as they were in an earlier version.

Example of difference using the command line:

[manning]$ cat > camera.txt 
The size and picture quality of the camera is perfect.
[manning]$ java edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat typedDependencies -outputFormatOptions collapsedDependencies edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz camera.txt
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [2.4 sec].
Parsing file: camera.txt
Parsing [sent. 1 len. 11]: The size and picture quality of the camera is perfect .
det(quality-5, The-1)
nn(quality-5, size-2)
conj_and(size-2, picture-4)
nsubj(perfect-10, quality-5)
det(camera-8, the-7)
prep_of(quality-5, camera-8)
cop(perfect-10, is-9)
root(ROOT-0, perfect-10)

Parsed file: camera.txt [1 sentences].
Parsed 11 words in 1 sentences (6.94 wds/sec; 0.63 sents/sec).
[manning]$ java edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat typedDependencies -outputFormatOptions CCPropagatedDependencies edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz camera.txt
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [2.2 sec].
Parsing file: camera.txt
Parsing [sent. 1 len. 11]: The size and picture quality of the camera is perfect .
det(quality-5, The-1)
nn(quality-5, size-2)
conj_and(size-2, picture-4)
nn(quality-5, picture-4)
nsubj(perfect-10, quality-5)
det(camera-8, the-7)
prep_of(quality-5, camera-8)
cop(perfect-10, is-9)
root(ROOT-0, perfect-10)

Parsed file: camera.txt [1 sentences].
Parsed 11 words in 1 sentences (12.85 wds/sec; 1.17 sents/sec).

他のヒント

According to my observations it seems like, stanford online parser still uses older versions at its backend.

I have been using stanford parser for an year now. We have been using version 3.2.0 for a long time now. When version 3.3.0 was released with additional feature of sentimental analysis I have tried using the newer version. But, its dependencies where observed to be slightly varying from version 3.2.0 and the efficiency of our product has come down.

If your requirement is just extract dependencies and not use sentiment analysis. I would suggest you to use version 3.2.0.

Check the end of this page to download earlier versions of parser.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top