Question

when I open the sample files for training data of LIBSVM, I can't understand the file structure. Can someone please show me how to make it ?


Below is my training data to predict song writers of a song(as an example):

  • Feature 1: Number of "love" word in the lyric
  • Feature 2: Number of "friend" word in the lyric
  • Feature 3: Number of "zone" word in the lyrics


Training data:

  1. Song A (3, 0, 0), song writer is David
  2. Song B (0, 3, 1), song writer is Peter
  3. Song C (1, 3, 1), song writer is Tom


Testing data:

  1. Song D (3, 0, 1)


Thank you very much.

Was it helpful?

Solution

Libsvm ReadMe file can help you

The training data must be something like this

label feature1:value1 feature2:value2 ..... -1:? (? can be any number)

but in the Libsvm there is something called svm_node that do the same thing:

sample code in java:

for (int k = 0; k < dataCount; k++) {
        prob.x[k] = new svm_node[features.length];
        for (int j = 0; j < features.length; j++) {
            svm_node node = new svm_node();
            node.index = featuresIndex[j];
            node.value = features[j];
            prob.x[k][j] = node;
        }
        prob.y[k] = lable;
    }

OTHER TIPS

In this problem of classification we have three classes for our whole dataset David, Peter, Tom and we assign them values 0, 1 and 2 receptively.

The format of data set will be.

[label] [feature number] : [the no of times that feature occurs] .... ....

Our training data file will look like this.

0 1:3 2:0 3:0
1 1:0 3:1 3:1
2 1:1 2:3 3:1

This file can be used to train our model. In this file there are 3 rows and four columns, the first column represents the actual result and the other columns represent the feature number : the number of times that feature occurs.

The testing data will be treated as.

 1:3 2:0 3:1

this will be passed to svm model and then prediction can be drawn.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top