Wrong correlation result for big numbers

https://stackoverflow.com/questions/14339952

15-01-2022
|

Question

The cor() function fails to compute the correlation value if there are extremely big numbers in the vector and returns just zero:

foo <- c(1e154, 1, 0)
bar <- c(0, 1, 2)
cor(foo, bar)
# -0.8660254
foo <- c(1e155, 1, 0)
cor(foo, bar)
# 0

Although 1e155 is very big, it's much smaller than the maximum number R can deal with. It's surprising for me why R returns a wrong value and does not return a more suitable result like NA or Inf.

Is there any reason for that? How to be sure we will not face such a situation in our programs?

Solution

Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations. (from http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient)

foo <- c(1e154, 1, 0)
sd(foo)
## [1] 5.773503e+153
foo <- c(1e155, 1, 0)
sd(foo)
## [1] Inf

And, even more fundamental, to calculate sd() you need to take the square of x:

1e154^2
[1] 1e+308

1e155^2
[1] Inf

So, your number is indeed at the boundary of what is possible to calculate using 64 bits.

Using R-2.15.2 on Windows I get:

cor(c(1e555, 1, 0), 1:3)
[1] NaN

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow