Question

1) How can i apply feature reduction methods like LSI etc in weka for text classification?

2) Do applying feature reduction methods like LSI etc can improve the accuracy of classification ?

Was it helpful?

Solution

  1. Take a look at FilteredClassifier class or at AttributeSelectedClassifier. With FilteredClassifier you can use such features reduction method as Principal Component Analysis (PCA). Here is a video how to filter your dataset using PCA, so that you could try different classifiers on reduced dataset.

  2. It can help, but there is no guarantee about that. If you remove redundant features, or transform features in some way (like SVM or PCA do) classification task can become simpler. Anyway big number of features usually lead to curse of dimensionality and attribute selection is a way to avoid it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top