cor shows only NA or 1 for correlations - Why?
-
05-10-2019 - |
Question
I'm running cor()
on a data.frame
with all numeric values and I'm getting this as the result:
price exprice...
price 1 NA
exprice NA 1
...
So it's either 1
or NA
for each value in the resulting table. Why are the NA
s showing up instead of valid correlations?
Solution
The 1
s are because everything is perfectly correlated with itself, and the NA
s are because there are NA
s in your variables.
You will have to specify how you want R to compute the correlation when there are missing values, because the default is to only compute a coefficient with complete information.
You can change this behavior with the use
argument to cor
, see ?cor
for details.
OTHER TIPS
Tell the correlation to ignore the NAs with use
argument, e.g.:
cor(data$price, data$exprice, use = "complete.obs")
NAs also appear if there are attributes with zero variance (with all elements equal); see for instance:
cor(cbind(a=runif(10),b=rep(1,10)))
which returns:
a b
a 1 NA
b NA 1
Warning message:
In cor(cbind(a = runif(10), b = rep(1, 10))) :
the standard deviation is zero
very simple and correct answer
Tell the correlation to ignore the NAs with use argument, e.g.:
cor(data$price, data$exprice, use = "complete.obs")
The NA can actually be due to 2 reasons. One is that there is a NA in your data. Another one is due to there being one of the values being constant. This results in standard deviation being equal to zero and hence the cor function returns NA.