The χ² test builds a contingency table of n_classes
times n_features
. In a regression model, there is no notion of n_classes
. The only way to make it work would be to bin your y
values, do feature selection, then train a regression model on the original y
and the reduced feature set. There is no support for this in scikit-learn, so you'll have to program it yourself.
chi-square as scoring function for regression
-
30-07-2022 - |
Question
It is documented in http://scikit-learn.org/0.9/modules/feature_selection.html "Warning Beware not to use a regression scoring function with a classification problem."
I am trying to find the best features for a regression problem and using f_regression as the scoring function. But it is extremely Memory hungry and my 8GB machine hangs and finally I get Memory error.
I have used Chi2 as a scoring function for the same problem and it works very fast. Wanted to know if the reverse of the warning is true ? If not can I use Chi2 as a scoring function for regression problem ?
Solution
OTHER TIPS
No you should not use Chi2 scoring function as it has no proved guarantee to be accurate for regression model. You have to check your f_regression solution or use other solution like recursive elimination or PCA(Principle Component Analysis)
http://en.wikipedia.org/wiki/Principal_component_analysis
I personally would advice PCA, it gives very robust results.
I'd suggest you use LASSO if your problem is regression. Lasso is just standard regression with L1 regularization baked in; this has the effect of driving many feature weights to zero.