Text Classification - how to find the features that most affected the decision

Question 1

If you use the linear kernel then yes - simply compute the weights vector:

w = SUM_i y_i alpha_i sv_i

Where:

sv - support vector
alpha - coefficient found with SVMlight
y - corresponding class (+1 or -1)

(in some implementations alpha's are already multiplied by y_i and so they are positive/negative)

Once you have w, which is of dimensions 1 x d where d is your data dimension (number of words in the bag of words/tfidf representation) simply select the dimensions with high absolute value (no matter positive or negative) in order to find the most important features (words).

If you use some kernel (like RBF) then the answer is no, there is no direct method of taking out the most important features, as the classification process is performed in completely different way.

Question 2

As @lejlot mentioned, with linear kernel in SVM, one of the feature ranking strategies is based on the absolute values of weights in the model. Another simple and effective strategy is based on F-score. It considers each feature separately and therefore cannot reveal mutual information between features. You can also determine how important a feature is by removing that feature and observe the classification performance.

You can see this article for more details on feature ranking.

With other kernels in SVM, the feature ranking is not that straighforward, yet still feasible. You can construct an orthogonal set of basis vectors in the kernel space, and calculate the weights by kernel relief. Then the implicit feature ranking can be done based on the absolute value of weights. Finally the data is projected into the learned subspace.