Question

I am using OpenCV letter_recog.cpp example to experiment on random trees and other classifiers. This example has implementations of six classifiers - random trees, boosting, MLP, kNN, naive Bayes and SVM. UCI letter recognition dataset with 20000 instances and 16 features is used, which I split in half for training and testing. I have experience with SVM so I quickly set its recognition error to 3.3%. After some experimentation what I got was:

UCI letter recognition:

  • RTrees - 5.3%
  • Boost - 13%
  • MLP - 7.9%
  • kNN(k=3) - 6.5%
  • Bayes - 11.5%
  • SVM - 3.3%

Parameters used:

  • RTrees - max_num_of_trees_in_the_forrest=200, max_depth=20, min_sample_count=1

  • Boost - boost_type=REAL, weak_count=200, weight_trim_rate=0.95, max_depth=7

  • MLP - method=BACKPROP, param=0.001, max_iter=300 (default values - too slow to experiment)

  • kNN(k=3) - k=3

  • Bayes - none

  • SVM - RBF kernel, C=10, gamma=0.01

After that I used same parameters and tested on Digits and MNIST datasets by extracting gradient features first (vector size 200 elements):

Digits:

  • RTrees - 5.1%
  • Boost - 23.4%
  • MLP - 4.3%
  • kNN(k=3) - 7.3%
  • Bayes - 17.7%
  • SVM - 4.2%

MNIST:

  • RTrees - 1.4%
  • Boost - out of memory
  • MLP - 1.0%
  • kNN(k=3) - 1.2%
  • Bayes - 34.33%
  • SVM - 0.6%

I am new to all classifiers except SVM and kNN, for these two I can say the results seem fine. What about others? I expected more from random trees, on MNIST kNN gives better accuracy, any ideas how to get it higher? Boost and Bayes give very low accuracy. In the end I'd like to use these classifiers to make a multiple classifier system. Any advice?

Was it helpful?

Solution

Dimensionality Reduction

Another important procedure is to compare the error rates on training and test dataset to see if you are overfitting (due to the "curse of dimensionality"). E.g., if your error rate on the test dataset is much larger than the error on the training data set, this would be one indicator.
In this case, you could try dimensionality reduction techniques, such as PCA or LDA.

If you are interested, I have written about PCA, LDA and some other techniques here: http://sebastianraschka.com/index.html#machine_learning and in my GitHub repo here: https://github.com/rasbt/pattern_classification

Cross validation

Also you may want to take a look at cross-validation techniques in order to evaluate the performance of your classifiers in a more objective manner

OTHER TIPS

I expected more from random trees:

  • With random forests, typically for N features, sqrt(N) features are used for each decision tree construction. Since in your case N=20, you could try setting max_depth (the number of sub-features to construct each decision tree) to 5.

  • Instead of decision trees, linear models have been proposed and evaluated as base estimators in random forests, in particular multinomial logistic regression and naive Bayes. This might improve your accuracy.

On MNIST kNN gives better accuracy, any ideas how to get it higher?

  • Try with a higher value of K (say 5 or 7). A higher value of K would give you more supportive evidence about the class label of a point.
  • You could run PCA or Fisher's Linear Discriminant Analysis before running k-nearest neighbour. By this you could potentially get rid of correlated features while computing distances between the points, and hence your k neighbours would be more robust.
  • Try different K values for different points based on the variance in the distances between the K neighbours.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top