Why does adding features to linear regression decrease accuracy?

https://stackoverflow.com/questions/8469129

13-03-2021
|

Question

I am new to ML and am working on a kaggle competition to learn a bit. When I add certain features to my dataset, the accuracy decreases.

Why isn't the feature that adds to the cost just weighted to zero (ignored)? Is it because non-linear features can cause the a local-minimum solution?

Thanks.

Solution

If you're talking about training error for a linear regression classifier, then adding features will always decrease your error unless you have a bug. Like you say, it's a convex problem and the global solution can never be worse as you can just set the weight to zero.

If you're talking about test error however, then overfitting is going to be the big issue with adding features, and is certainly something you would observe.

OTHER TIPS

I cant comment therefore posting as answer.

@agilefall: you are not necessarily wrong. If you are measuring accuracy in terms of the correlation between predicted output and actual output then the accuracy can decrease as you add more feature. linear regression does not guarantee anything about that.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow