Frage

I have know how to communicate with twitter and how to retrieve tweets but I am looking for further working on these tweets.

I have two categories food and sports. Now I want to categorize tweets into food and sports. Can anyone please suggest me how to categorize on basis of computer algorithm?

regards Gaurav

War es hilfreich?

Lösung

I've been doing some work recently with Latent Dirichlet Allocation. The general idea is that documents contain words that are generated from topics. What you could try doing is loading a corpus of documents known to be about the topics you are interested in, update with the tweets of interest, and then select tweets that have strong probabilities for the same topics as your known documents.

I use R for LDA (package:topicmodels and package:lda), but I think there are some prebuilt python tools for this too. I would probably steer away from trying to write your own unless you have a solid grounding in Bayesian statistics.

Here's the documentation for the topicmodels package: http://cran.r-project.org/web/packages/topicmodels/vignettes/topicmodels.pdf

Andere Tipps

I doubt that a set of algorithm could possibly categorize tweets in open domain. In other words I don't think a set of rules can possibly categorizes open domain tweets. You need to parse tweets into a semantic representation customized for the categorization.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top