What happens when you have highly correlated columns in a dataset
-
22-10-2019 - |
Question
I am doing a regression model
. And I was wondering what would be the consequence if we have two or more Highly correlated
columns in the dataset ? is that something that can decrease the accuracy of the model ?
Answering this question would help decide how to deal with it. PCA
would be the best option here ?
Solution
Having highly correlated features is a type of redundancy in features. And yes, it effects a regression model if you are having highly correlated features. A very nice explanation is given here.
PCA is a nice choice when it comes to dimensionality reduction.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange