Question

I love Stanford CoreNLP, by now it is pretty accurate for my NLP needs. The problem is that analyzing massive amounts of text (let's say millions of sentences) takes days.

Are there alternative Java implementations that sacrifice some accuracy for efficiency while (ideally) providing the same API?

Was it helpful?

Solution

If you are using the PCFG or factored models, you might consider switching to the new RNN models that are available since version 3.2 - they are much faster.

Alternatively, if you only require dependencies, there are other parsers you could try (e.g. mate-tools parser or the ClearNLP dependency parser). If you need constituents, you could try the Berkeley parser.

Afaik, there are no other parser implementations that have the same API as the Stanford Parser. However, there are collections that offer a rather uniform API to different parsers, e.g. DKPro Core or ClearTK.

Disclosure: I am a developer on the DKPro Core project.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top