
It is documented in "Warning Beware not to use a regression scoring function with a classification problem."

I am trying to find the best features for a regression problem and using f_regression as the scoring function. But it is extremely Memory hungry and my 8GB machine hangs and finally I get Memory error.

I have used Chi2 as a scoring function for the same problem and it works very fast. Wanted to know if the reverse of the warning is true ? If not can I use Chi2 as a scoring function for regression problem ?

Was it helpful?


The χ² test builds a contingency table of n_classes times n_features. In a regression model, there is no notion of n_classes. The only way to make it work would be to bin your y values, do feature selection, then train a regression model on the original y and the reduced feature set. There is no support for this in scikit-learn, so you'll have to program it yourself.


No you should not use Chi2 scoring function as it has no proved guarantee to be accurate for regression model. You have to check your f_regression solution or use other solution like recursive elimination or PCA(Principle Component Analysis)

I personally would advice PCA, it gives very robust results.

I'd suggest you use LASSO if your problem is regression. Lasso is just standard regression with L1 regularization baked in; this has the effect of driving many feature weights to zero.

Scikit has an implementation of Lasso.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top