Question

I'm looking for code or a product or a service to do semantic analysis of text (sentences and or paragraphs) to categorize the text by general topic, e.g.

  • Finance
  • Entertainment
  • Technology
  • Business
  • Art
  • etc...
Was it helpful?

Solution

If you have a bunch of examples that have already been categorised, you can use these to train a classifier. This is a very simple document classfication problem, and any suite of machine learning tools will have the algorithms and tutorials for this. For instance, check out weka: http://www.cs.waikato.ac.nz/ml/weka/

or rapidminer: http://rapid-i.com/content/blogcategory/38/69/

If your needs are limited, and you just want a simple API, you cannot go wrong with this Naive Bayes library: https://ci-bayes.dev.java.net/

Good luck!

OTHER TIPS

If you want to evaluate a commercial service API, check out the VIKI engine APIs: http://www.softwareevolution.it/en/products/viki-core-api.html

It is an easy to use Json service api with specific semantic features.

Would this be of any help to you?

http://en.wikipedia.org/wiki/Document_classification

It's not a finished product or service, neither code, but it describes the various algorithms that can be used for semantic analysis. Googling on a bit further, I believe that it's not really out of the laboratory yet. People are experimenting with KNN algorithms mostly, resulting in cool stuff, but not really what you need:

http://www.ebi.ac.uk/webservices/whatizit/info.jsf

But if there is some software that will do what you ask, it would be in this list:

http://www.kdnuggets.com/software/text.html

For example the LPU program, it seems to be able to learn if you feed it enough teaching documents.

http://www.cs.uic.edu/~liub/LPU/LPU-download.html

If you're into Python/interpreted languages, check out the excellent NLTK framework at nltk.org. It has an excellent how to page and a recently published O'Reilly book.

If you're into Java and/or require a more mature but harder to grasp framework, try GATE instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top