Question

In natural language processing, named-entity recognition is the challenge of, well, recognizing named entities such as organizations, places, and most importantly names.

There is a major challenge in this though that I call that of synonymy: The Count and Dracula are in fact referring to the same person, but it it possible that this is never discussed directly in the text.

What would be the best algorithm to resolve these synonyms?


If there is a feature for this in any Python-based library, I'm eager to be educated. I'm using NLTK.

Was it helpful?

Solution

You are describing a problem of coreference resolution and named entity linking. I'm providing separate links as I am not entirely sure which one you meant.

  • Coreference: Stanford CoreNLP currently has one of the best implementations, but is in Java. I have used the python bindings and I wasn't too happy- I ended up running all my data through the Stanford pipeline just once, and then loading the processed XML files in python. Obviously, that doesn't work if you have to be processing in real time.
  • Named entity linking: Check out Apache Stanbol and the links in the following Stackoverflow post.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top