Question

While reviewing the sklearn's OneHotEncoder documentation (attached below) I noticed that when applying regularization (e.g., lasso, ridge, etc.) it is not recommended to drop the first category. While I understand why dropped the first category prevents collinearity, I am unsure why it is needed for regularized regression. Wouldn't this this add an additional dimension that will need to be regularized?

drop{‘first’, ‘if_binary’}

Specifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into a neural network or an unregularized regression. However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models.

Was it helpful?

Solution

When you do linear regression you have to leave out one column as it's a singular matrix and hence columns are linearly dependent and we cannot calculate the inverse.

But when you do regularization it take cares of singularity. The matrix is almost surely nonsingular. Hence we don't need to drop a column and if you drop different columns from each feature it could lead to different prediction as it would lead to bias.

Refer this.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top