Question

I have a question about ridge regression and about its benefits (relative to OLS) when the datasets are big. Do the benefits of ridge regression disappear when the datasets are larger (e.g. 50,000 vs 1000)? When the dataset is large enough, wouldn't the normal OLS model be able to determine which parameters are more important, thus reducing the need for the penalty term? Ridge regression makes sense when the data sets are small and there is scope for high variance, but do we expect its intended benefits (relative to OLS) to disappear for large datasets?

Was it helpful?

Solution

One thing that you might be overlooking with your reasoning is the fact that increased predictors does not necessarily lead to a better model. In this case, the more predictors you have the higher the risk of collinearity that there is between them - thus increasing the utility of ridge regression.

In fact, a second point:

The larger the number of predictors that you have in your model the more useful techniques like ridge regression are. This is because with so many predictors is is very difficult to determine if collinearity or other relationships exist between predictors. When you have a smaller model with say 5 predictors, you could verify this fairly easily. With 1,000 predictors this would be much more difficult.

In the case of the Lasso regression the benefits are even more obvious. Since predictor coefficients values can be shrunk fully to zero this acts as a form of feature selection. Here, you are potentially removing predictors that capture redundant information.

Also - in answer to the second part of your question - OLS does not perform any form of feature selection. Therefore, it will not be able to pick out which parameters are the most important by adding more. The more parameters you add to your model the higher the variance gets. Why is this? You have more estimations to make and therefore collectively your model is less reliable. Consequently, your model will likely inflate the coefficients on the predictors giving undue "weight" or importance to certain predictors.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top