Does anyone know of a good quick and dirty text / grammar parser?

https://stackoverflow.com/questions/4306376

29-09-2019
|

Question

I have a "mad lib" scenario in which I want to

a) determine the parts of speech of every (or most) words in a sentence
b) have the user select alternatives to those words - or replace them computationally with equivalent words

I looked at the Stanford parser but its a bit slow ... any suggestions?

Solution

Use a POS tagger

If you're just using the part-of-speech (POS) tags and not the parse trees, you don't actually need to use a parser. Instead, you can just use a standalone POS tagger.

POS tagging is much faster than phrase-structure parsing. On a Xeon E5520, the Stanford POS tagger can tag 1700 sentences in 3 seconds, while the same data takes about 10 minutes to parse using the Stanford Parser (Cer et al. 2010).

There's a fairly comprehensive list of other POS taggers here.

OTHER TIPS

For a toolkit approach, there's the NLTK toolkit. It is in Python, so like-for-like speed might not be quite what you want; but being a toolkit intended for teaching, there are a lot of different approaches that can be implemented. Ie. it might be easy to implement a quick parser/tagger even though the underlying language might not be the fastest available.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow