Question

I am looking for a simple but "good enough" Named Entity Recognition library (and dictionary) for java, I am looking to process emails and documents and extract some "basic information" like: Names, places, Address and Dates

I've been looking around, and most seems to be on the heavy side and full NLP kind of projects.

Any recommendations ?

Was it helpful?

Solution 2

BTW, I recently ran across OpenCalais which seems to havethe functionality I was looking after.

OTHER TIPS

You might want to have a look at one of my earlier answers to a similar problem.

Other than that, most lighter NER systems depend a lot on the domain used. You will find a whole lot of tools and papers about biomedical NER systems, for example. In addition to my previous post (which already contains my main recommendation if you want to do NER), here are some more tools you might want to look into:

  • The Stanford CER-NER
  • The Postech Biomedical NER System if you are interested in this particular domain
  • OpenCalais seems to be a commercial system. There are UIMA wrappers for OpenCalais but they seem dated. There is also a dictionary based Context-Mapper annotator for UIMA that may help you out. Be aware that UIMA implies significant overhead in learning curve ;-)
  • OpenNLP also have an NER tool.
  • Balie does NER, too, among other things.
  • ABNER does NER, but again its focused on the biomedical domain.
  • The JULIE Lab Tools from the university of Jena, Germany also do NER. They have standalone versions and UIMA analysis engines.

One additional remark: you won't get away without tokenization on the input. Tokenization of natural language is slightly non-trivial, that's why I suggest you use a toolbox that does both for you.

You might want to try Alchemy API as well. Its similar to Open Calais.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top