SVM-pref package from Cornell university

https://stackoverflow.com//questions/22037638

21-12-2019
|

質問

I'm using SVM-pref (http://svmlight.joachims.org) for a binary classification problem. I don't have much experience with this package and so I seek help with the following questions:

(1) My features are all discrete/nominal. Is there a special way to represent the feature vectors like a special way to convert the nominal values into continuous values or do we just replace the nominal values for dummy numbers like 1, 2, 3 .. etc.?

(2) If the answer to the first question is we replace nominal values with dummy numbers, then my second question is we start numbering feature values from 1 so we have 1:1 but not 1:0 otherwise the learner will consider a zero-value feature as non-existent. Is that correct?

(3) How to we configure the best -c values and the values for the rest of the parameters? Is it only by error and trial or are their other approaches used to decide on these parameters?

解決 2

Here is also another useful and informative discussion about representing nominal features for SVM classifiers.

他のヒント

To use categorical features in SVM you must encode them using dummy variables, e.g. one-hot coding. For every level of the category, you should introduce a dimension. Something like this for a feature with levels A, B and C:

A -> [1,0,0]
B -> [0,1,0]
C -> [0,0,1]
See answer to previous question: use one dimension per categorical level.
Typically this is done by testing possible values in a cross-validation setting.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow