Question

i am trying to develop a very simple program for classifying and categorising documents using various algorithms. My problem, since i am a beginner is that i cannot find good articles or websites for simple tutorials of how to get started with it. I have read quite few resources and i have learnt a lot of things but each document,site etc i read it uses different techniques, it analyses the problem in different way, proposing different solutions etc. so i am getting confused. Is there any good resources that you can point me to in order to get started with actual implementation?

Also i am looking for actual test data and specifically documents that are categorised so i can "feed" my algorithms. Any help appreciated. Thanks.

Was it helpful?

Solution

For Python, check out the scikit-learn tutorial on text classification. See also its demo script that runs dozens of different text classification algorithms (including Naive Bayes and SVMs) on the twenty newsgroups benchmark data set. [Disclaimer: I co-wrote these things.]

For Weka, here's a tutorial.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top