Question

I am trying to classify text and then map the nouns on to a person, place, or a thing. Is there a way or dictionary to do that?

Was it helpful?

Solution 2

Since you are dealing with classification, it might be interesting for you to have a look at AlchemyAPI, http://www.alchemyapi.com/products/features/. You have a free api key where you can try things.

But this doesn't stops here, if you want to do it manually, as your can see in @tripleee answer, WordNet is mentioned, this is also something interesting, and right now you have API where you can use like Java and interact with WordNet.

More, you have ontologies and many are free, these ontologies are normally in OWL or RDF. You can query these ontologies and find the relevant information. For OWL ontologies, you have the OWLAPI which you can use to interact, for RDF ontologies, you can use Apache Jena and write SPARQL statements.

Also, you have dbpedia, which is very very interesting and I believe that this might solve your problem to a big extent. dbpedia is Wikipedia in machine-readable format.

For example, you can write SPARQL (much like SQL Statements), for e.g. suppose I want to check the relationship between London and UK,

SELECT ?property
WHERE {
:London ?property :United_Kingdom
} 

OR suppose that I want to get all countries and cities,

SELECT DISTINCT ?city ?country
WHERE { ?city rdf:type dbpedia-owl:City ;
rdfs:label ?label ;
dbpedia-owl:country ?country
}

OTHER TIPS

I am surprised Named Entity Recognition and Named Entity Linking have not been mentioned. Sounds to me that this is exactly what you are asking. Here is an example: suppose you had the following document

Obama flew to Japan yesterday.

Recognising the named entities in this document amounts to figuring out that

Obama/PERSON flew to Japan/LOCATION yesterday.

Linking these named entities to a knowledge base (e.g. wikipedia or freebase), you get:

Obama/PERSON -> http://en.wikipedia.org/wiki/Barack_Obama
Japan/LOCATION -> http://en.wikipedia.org/wiki/Japan

There are many standard tools that recognise or link named entities. In general, recognition is easier and you can expect to get pretty reasonable performance out of the box. Of course, if your data is very domain-specific, you can get much better accuracy by training your own model on data from the same domain.

What you are looking for is subcategorization and there are dictionaries for that, but I doubt you can find one which implements your ad-hoc three-way subcategories (even assuming you want to include e.g. "awkwardness" and "gender" in the "thing" subcategory).

Proper names vs. regular nouns is probably feasible by the simple heuristic of capitalization; maybe something like WordNet or Wiktionary could help sort out places vs. persons within the proper names?

You may also want to look into lexicon acquisition, i.e. building a subcategorization dictionary of your own by automated or semi-automated means. Maybe look at a tagged corpus like Brown and analyze how persons appear in different grammatical roles than places?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top