Question

My question is three-fold

In the context of "Kernelized" support vector machines

  1. Is variable/feature selection desirable - especially since we regularize the parameter C to prevent overfitting and the main motive behind introducing kernels to a SVM is to increase the dimensionality of the problem, in such a case reducing the dimensions by parameter reduction seems counter-intuitive
  2. If the answer to the 1st question is "NO", then, On what conditions would the answer change that one should keep in mind ?
  3. Are there any good methods that have been tried to bring about feature reduction for SVMs in scikit-learn library of python - I have tried the SelectFpr method and am looking for people with experiences with different methods.
Was it helpful?

Solution

Personally, I like to divide feature selection in two:

  • unsupervised feature selection
  • supervised feature selection

Unsupervised feature selection are things like clustering or PCA where you select the least redundant range of features (or create features with little redundancy). Supervised feature selection are things like Lasso where you select the features with most predictive power.

I personally usually prefer what I call supervised feature selection. So, when using a linear regression, I would select features based on Lasso. Similar methods exist to induce sparseness in neural networks.

But indeed, I don't see how I would go about doing that in a method using kernels, so you are probably better off using what I call unsupervised feature selection.

EDIT: you also asked about regularization. I see regularization as helping mostly because we work with finite samples and so the training and testing distribution will always differ somewhat, and you want your model to not overfit. I am not sure it removes the need to avoid selecting features (if you indeed have too many). I think that selecting features (or creating a smaller subset of them) helps by making the features you do have more robust and avoid the model to learn from spurious correlations. So, regularization does help, but not sure that it is a complete alternative. But I haven't thought thoroughly enough about this.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top