Question

The cor() function fails to compute the correlation value if there are extremely big numbers in the vector and returns just zero:

foo <- c(1e154, 1, 0)
bar <- c(0, 1, 2)
cor(foo, bar)
# -0.8660254
foo <- c(1e155, 1, 0)
cor(foo, bar)
# 0

Although 1e155 is very big, it's much smaller than the maximum number R can deal with. It's surprising for me why R returns a wrong value and does not return a more suitable result like NA or Inf.

Is there any reason for that? How to be sure we will not face such a situation in our programs?

Was it helpful?

Solution

Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations. (from http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient)

foo <- c(1e154, 1, 0)
sd(foo)
## [1] 5.773503e+153
foo <- c(1e155, 1, 0)
sd(foo)
## [1] Inf

And, even more fundamental, to calculate sd() you need to take the square of x:

1e154^2
[1] 1e+308

1e155^2
[1] Inf

So, your number is indeed at the boundary of what is possible to calculate using 64 bits.

Using R-2.15.2 on Windows I get:

cor(c(1e555, 1, 0), 1:3)
[1] NaN
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top