Pregunta

I have a set of data that I am trying to learn using SVM. For context, the data has a dimensionality of 35 and contains approximately 30'000 data-points.

I have previously trained decision trees in Matlab with this dataset and it took approximately 20 seconds. Not being totally satisfied with the error rate, I decided to try SVM.

I first tried svmtrain(X,Y). After about 5 seconds, I get the following message:

??? Error using ==> svmtrain at 453 Error calculating the kernel function: Out of memory. Type HELP MEMORY for your options.

When I looked up this error, it was suggested to me that I use the SMO method: svmtrain(X, Y, 'method', 'SMO');. After about a minute, I get this:

??? Error using ==> seqminopt>seqminoptImpl at 236 No convergence achieved within maximum number (15000) of main loop passes

Error in ==> seqminopt at 100 [alphas offset] = seqminoptImpl(data, targetLabels, ...

Error in ==> svmtrain at 437 [alpha bias] = seqminopt(training, groupIndex, ...

I tried using the other methods (LS and QP), but I get the first behaviour again: 5 second delay then

??? Error using ==> svmtrain at 453 Error calculating the kernel function: Out of memory. Type HELP MEMORY for your options.

I'm starting to think that I'm doing something wrong because decision trees were so effortless to use and here I'm getting stuck on what seems like a very simple operation.

Your help is greatly appreciated.

¿Fue útil?

Solución

Did you read the remarks near the end about the algorithm memory usage?

Try setting the method to SMO and use a kernelcachelimit value that is appropriate to the memory you have available on your machine.

During learning, the algorithm will build a double matrix of size kernelcachelimit-by-kernelcachelimit. default value is 5000

Otherwise subsample your instances and use techniques like cross-validation to measure the performance of the classifier.

Here is the relevant section:

Memory Usage and Out of Memory Error

When you set 'Method' to 'QP', the svmtrain function operates on a data set containing N elements, and it creates an (N+1)-by-(N+1) matrix to find the separating hyperplane. This matrix needs at least 8*(n+1)^2 bytes of contiguous memory. If this size of contiguous memory is not available, the software displays an "out of memory" error message.

When you set 'Method' to 'SMO' (default), memory consumption is controlled by the kernelcachelimit option. The SMO algorithm stores only a submatrix of the kernel matrix, limited by the size specified by the kernelcachelimit option. However, if the number of data points exceeds the size specified by the kernelcachelimit option, the SMO algorithm slows down because it has to recalculate the kernel matrix elements.

When using svmtrain on large data sets, and you run out of memory or the optimization step is very time consuming, try either of the following:

  • Use a smaller number of samples and use cross-validation to test the performance of the classifier.

  • Set 'Method' to 'SMO', and set the kernelcachelimit option as large as your system permits.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top