First let me address the R solution; From what I understand, the e1071 package is simply a wrapper around the libsvm library. Therefore assuming you use the same settings and steps in both, you should be getting the same results.
I'm not a regular R user myself, but from I can tell you are not performing data normalization in the R code (in order to scale the features into the [-1,1]
range). As we know SVMs are not scale invariant, so this omission should explain the difference from the other results.
MATLAB has its own implementations in svmtrain and fitcsvm. It only supports binary classification, so you will have to manually handle multi-class problems (see here for an example).
The documentation explains that it uses the standard SMO algorithm (actually one of three possible algorithms offered to solve the quadratic-programming optimization problem). The docs lists a couple of books and papers at the bottom as references. In principle you should get similar predictions as libsvm (assuming you replicate the parameters used and apply the same kind of preprocessing to the data).
Now as for libsvm vs. liblinear, you should know that the implementations differ a little bit in the formulation of the objective function:
libsvm solves the following dual problem:
On the other hand, the dual form of liblinear with a L2-regularized L1-loss SVC solver is:
... not to mention that the algorithms are coded with different goals in mind: libsvm is written in a way to allow switching between different kernel functions, while liblinear is optimized to always be linear and have no concept of kernels at all. Which is why libsvm is not easily applicable to large scale problems (even with a linear kernel), and often it is suggested to use liblinear when you have large number of instances.
Furthermore regarding multi-class problems with k
classes, libsvm by default implements the one-against-one approach by constructing k*(k-1)/2
binary classifiers, while liblinear implements the one-vs-the-rest strategy by constructing k
binary classifiers (it also has an alternative method by Crammer and Singer for handling multi-class problems). I have previously shown how to perform one-vs-rest classification using libsvm (see here and here).
You also have to make sure to match the parameters passed to each (as close as possible):
- libsvm should be set to C-SVM classifier with linear kernel by calling
svm-train.exe -s 0 -t 0
- liblinear solver type should be set to
L2R_L1LOSS_DUAL
by callingtrain.exe -s 3
(dual form of L2-regularized L1-loss support vector classifier) - the cost parameter should obviously match
-c 1
for both training functions - the tolerance for the termination criterion should match (default value of
-e
parameter differs between the two libraries, withe=0.001
for libsvm ande=0.1
for liblinear) - liblinear should be explicitly instructed to add a bias term since it is disabled by default (by adding
train.exe -B 1
).
Even then, I'm not certain you will get the exact same results in both, but predictions should be close enough...
Other considerations include how the libraries handle categorical features. For instance I know that libsvm converts a categorical feature with m
possible values into a m
numeric 0-1 features encoded as binary indicator attributes (i.e only one of them is one, rest is zeros). I'm not sure what liblinear does with discrete features.
Another issue is whether a particular implementation is deterministic and always returns the same results when repeated on the same data using the same settings. I've read somewhere that liblinear internally generates random-numbers during its work, but please dont take my word for it without actually checking the source code :)