Question

I'll use openCV's (C++) SVM(Support Vector Machines) for classification. But have a problem:

Feature vectors are so big (each has 1890000 elements) and I have more than 10000 feature vectors to train SVM. How can I manipulate feature vectors or use them without experience memory problems?

Was it helpful?

Solution

With such high dimensions and with that many training samples you will require a lot of memory to use any popular implementation of SVM. If I were to face this problem then I would consider at least one of these options:

  • Reduce the dimension of each vector, there are plenty of algorithms to do this but PCA is a good start.
  • Get computing time in some host with a lot of memory (maybe one of amazon ec2 instances would be suffice)
  • Test with a linear online approximation of SVM. In high dimensions, it is very likely that you can separate the classes linearly and there are SVM online approximations that you could use and then load to memory just one sample at a time in which case you don't need as much memory (I would consider pegasos-svm for this).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top