Question

How LIBSVM works performs multivariate regression is my generalized question? In detail, I have some data for certain number of links. (Example 3 links). Each link has 3 dependent variables which when used in a model gives output Y. I have data collected on these links in some interval.

LinkId | var1 |  var2  | var3 | var4(OUTPUT)
   1   |  10  |  12.1  |  2.2 |  3
   2   |  11  |  11.2  |  2.3 |  3.1
   3   |  12  |  12.4  |  4.1 |  1
   1   |  13  |  11.8  |  2.2 |  4
   2   |  14  |  12.7  |  2.3 |  2
   3   |  15  |  10.7  |  4.1 |  6
   1   |  16  |  8.6   |  2.2 |  6.6
   2   |  17  |  14.2  |  2.3 |  4
   3   |  18  |  9.8   |  4.1 |  5

I need to perform prediction to find the output of

(2,19,10.2,2.3). 

How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference? NOTE : Notice each link with same ID has same value.

Was it helpful?

Solution

This is not really a matlab or libsvm question but rather a generic svm related one.

How LIBSVM works performs multivariate regression is my generalized question?

LibSVM is just a library, which in particular - implements the Support Vector Regression model for the regression tasks. In short words, in a linear case, SVR tries to find a hyperplane for which your data points are placed in some margin around it (which is quite a dual approach to the classical SVM which tries to separate data with as big margin as possible).

source:kernelsvm.tripod.com/

In non linear case the kernel trick is used (in the same fashion as in SVM), so it is still looking for a hyperplane, but in a feature space induced by the particular kernel, which results in the non linear regression in the input space.

enter image description here

Quite nice introduction to SVRs' can be found here: http://alex.smola.org/papers/2003/SmoSch03b.pdf

How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference? NOTE : Notice each link with same ID has same value.

You could train SVR (as it is a regression problem) with the whole data, but:

  • seems that var3 and LinkId are the same variables (1->2.2, 2->2.3, 3->4.1), if this is a case you should remove the LinkId column,
  • are values of var1 unique ascending integers? If so, these are also probably a useless featues (as they do not seem to carry any information, they seem to be your id numbers),
  • you should preprocess your data before applying SVM so eg. each column contains values from the [0,1] interval, otherwise some features may become more important than others just because of their scale.

Now, if you would like to create a separate model for each link, and follow above clues, you end up with 1 input variable (var2) and 1 output variable var4, so I would not recommend such a step. In general it seems that you have very limited featues set, it would be valuable to gather more informative features.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top