Question

I want to extract terminological units from a corpus of specialized documents. Is there any algorithm or out-of-box solution for this? Can nltk do this?

It seems this thread addressed my question. Extracting terms with contextual relevance (noun phrases) from text in a .NET project

Was it helpful?

Solution

The description of what you want isn't very clear. To get better help you should probably also post an example

It sounds like what you're looking for is called Named Entity Recognition. Depending exactly on what you want (and your data) there are existing systems that are very good, but the problem is definitely not solved. If this is what you want, important systems to look at are GATE, Apache OpenNLP and even NLTK.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top