سؤال

How do we use a correlation score between two variables for analysing data?

I have a set of 20 features and need to predict 21st feature. Now is it necessary that correlation between any two features should be close to 1 ? If I have 2 features with corr score close to -1, then does this mean that they are contradicting and thereby decreasing the accuracy ?

So how do we use a correlation score in analysis ?

هل كانت مفيدة؟

المحلول

Correlation should be as less as possible between different features, because correlated features mean that those features are giving out same kind of information/trend for the predictor to learn. Thus only one of them is actually useful for prediction.

Keeping more number of uninformative features (correlated features) would result in degraded accuracy if your sample size is similar to you feature set size. Feature selection using Recursive Feature elimination or PCA etc. can help you reduce your feature set to optimal size.

We calculate correlation score in predictive analysis between features and Target variable. When using linear regression to model a data set, we first see if the plot between different features and target variable values follow an upward (+ve correlation) or downward trend (-ve correlation) and not scattered randomly. If such a relationship exists then regression modelling on the data would work well.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى datascience.stackexchange
scroll top