Question

I'm exploring Stanford Temporal Tagger for my project to extract date entity from the text. The demo from http://nlp.stanford.edu:8080/sutime/process seems to be promising. I would like to understand whether this library is matured. And also somebody help me understanding how this library performs with big data. It would be also helpful if you can guide me about other java based temporal tagger libraries especially for big data requirements. Is there any apache project which does temporal tagging?

I found some of the libraries like

https://code.google.com/p/heideltime/

https://code.google.com/p/stemptag/

Was it helpful?

Solution

Yes, the SUTime library is mature and quite accurate, and has been run over tens of millions of words of text. (Just make sure you are not invoking the more expensive and much slower parts of Stanford CoreNLP - parsing and dcoref - which are not needed for temporal tagging.)

Heideltime is another very good Java library for temporal tagging. It has the advantage of supporting several languages, whereas SUTime at present only supports English. It has the disadvantage of coming configured to use TreeTagger as its part-of-speech tagger, which means you either need to deal with using this non-open source, non-Java component, or you need to write stuff to get it configured to use some other POS tagger. I'm not familiar with stemptag; I don't think there is any apache project for this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top