Best method to implement text classification (2 classes)

Question

As a 2-classes text classifier, I don't think you need:

(1) KNN: it is a clustering method rather than classification, and it is slow;

(2) Random forest: the decision trees may not be a good option in high sparse dimensions;

You can try:

(1) naive bayesian: most straightforward and easiest to code. Proved to work well in text classification problems;

(2) logistic regression: works well if your training sample number is much larger than the feature number;

(3) SVM: again, for training sample much more than features, SVM with linear kernel works as well as logistic regression. And it is also one of the top algorithms in text classification;

(4) Neural network: seems like a panacea in machine learning. In theory it can learn any models that SVM/logistic regression could. The problem is there are not so many packages on NN as there are in SVM. As a result, the optimization process for neural network is time-consuming.

Yet it is hard to say which algorithm is best suit for your case. If you are using python, scikit-learn includes almost all these algorithms for you to test. Besides, weka, which integrates many machine learning algorithms in a user friendly graphic interface, is also a good candidate for you to better know the performance of each algorithm.