Question

I want to implement a simple SVM classifier, in the case of high-dimensional binary data (text), for which I think a simple linear SVM is best. The reason for implementing it myself is basically that I want to learn how it works, so using a library is not what I want.

The problem is that most tutorials go up to an equation that can be solved as a "quadratic problem", but they never show an actual algorithm! So could you point me either to a very simple implementation I could study, or (better) to a tutorial that goes all the way to the implementation details?

Thanks a lot!

Was it helpful?

Solution

Some pseudocode for the Sequential Minimal Optimization (SMO) method can be found in this paper by John C. Platt: Fast Training of Support Vector Machines using Sequential Minimal Optimization. There is also a Java implementation of the SMO algorithm, which is developed for research and educational purpose (SVM-JAVA).

Other commonly used methods to solve the QP optimization problem include:

  • constrained conjugate gradients
  • interior point methods
  • active set methods

But be aware that some math knowledge is needed to understand this things (Lagrange multipliers, Karush–Kuhn–Tucker conditions, etc.).

OTHER TIPS

Are you interested in using kernels or not? Without kernels, the best way to solve these kinds of optimization problems is through various forms of stochastic gradient descent. A good version is described in http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf and that has an explicit algorithm.

The explicit algorithm does not work with kernels but can be modified; however, it would be more complex, both in terms of code and runtime complexity.

Have a look at liblinear and for non linear SVM's at libsvm

The following paper "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM" top of page 11 describes the Pegasos algorithm also for kernels.It can be downloaded from http://ttic.uchicago.edu/~nati/Publications/PegasosMPB.pdf

It appears to be a hybrid of coordinate descent and subgradient descent. Also, line 6 of the algorithm is wrong. In the predicate the second appearance of y_i_t should be replaced with y_j instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top