質問

i am trying to develop a very simple program for classifying and categorising documents using various algorithms. My problem, since i am a beginner is that i cannot find good articles or websites for simple tutorials of how to get started with it. I have read quite few resources and i have learnt a lot of things but each document,site etc i read it uses different techniques, it analyses the problem in different way, proposing different solutions etc. so i am getting confused. Is there any good resources that you can point me to in order to get started with actual implementation?

Also i am looking for actual test data and specifically documents that are categorised so i can "feed" my algorithms. Any help appreciated. Thanks.

役に立ちましたか?

解決

For Python, check out the scikit-learn tutorial on text classification. See also its demo script that runs dozens of different text classification algorithms (including Naive Bayes and SVMs) on the twenty newsgroups benchmark data set. [Disclaimer: I co-wrote these things.]

For Weka, here's a tutorial.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top