If you use the linear kernel then yes - simply compute the weights vector:
w = SUM_i y_i alpha_i sv_i
Where:
sv
- support vectoralpha
- coefficient found with SVMlighty
- corresponding class (+1 or -1)
(in some implementations alpha
's are already multiplied by y_i
and so they are positive/negative)
Once you have w
, which is of dimensions 1 x d
where d
is your data dimension (number of words in the bag of words/tfidf representation) simply select the dimensions with high absolute value (no matter positive or negative) in order to find the most important features (words).
If you use some kernel (like RBF) then the answer is no, there is no direct method of taking out the most important features, as the classification process is performed in completely different way.