Question

Below is an excerpt from the book Introduction to statistical learning in R, (chapter-linear model selection and regularization)

"In ridge regression, each least squares coefficient estimate is shrunken by the same proportion"

On a simple dataset, I obtained 2 non-intercept coefficients b1=-0.03036156 and b2=-0.02481822 using OLS. On l2 shrinkage with lambda=1, the new coefficients were b1=-0.01227141 and b2=-0.01887098. Both haven't reduced by equal proportions. What am I missing here?

Note:

  1. the assumption made in an Introduction to Statistical Learning book for the quoted statement is n=p
  2. the scale of both variables in my dataset is same
Was it helpful?

Solution

As I know we have below equation for Ridge Regression:

\begin{equation} RSS_{Ridge} = \Sigma_{i=1}^{n} (\hat{y}_{i} - y_{i})^2 - \lambda \Sigma_{j=1}^{p}(\beta^2) \end{equation}

First of all, it seems to me, if lambda goes higher does not mean that coefficients go down with inverse relation to lambda. Because the power of beta is two and the lambda is one.

I think you refer to page 226 of "An Introduction to Statistical Learning" book. In the footnote of the figure, the writer says:

"The ridge regression and lasso coefficient estimates for a simple setting with n = p and X 

a diagonal matrix with 1’s on the diagonal. Left: The ridge regression coefficient estimates

are shrunken proportionally towards zero, relative to the least-squares estimates.

Right: The lasso coefficient estimates are soft-thresholded towards zero."

In that figure shows that if we got +2 and -2 coefficient in the OLS model and if Ridge shrunk +2 to +1.8 then we are sure Ridge will shrink -2 to -1.8. So, in both cases with the same proportion of 0.2 coefficients go to near zero.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top