Вопрос

For data X = [0,0,1,1,0]and Y = [1,1,0,1,1]

>> np.corrcoef(X,Y) 

returns

array([[ 1.        , -0.61237244],
       [-0.61237244,  1.        ]])

However, I cannot reproduce this result using np.var and np.cov given the equation shown in http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html:

>> np.cov([0,0,1,1,0],[1,1,0,1,1])/sqrt(np.var([0,0,1,1,0])*np.var([1,1,0,1,1]))

array([[ 1.53093109, -0.76546554],
       [-0.76546554,  1.02062073]])

What's going on here?

Это было полезно?

Решение

This is because, np.var default delta degrees of freedom is 0, not 1.

In [57]:

X = [0,0,1,1,0]
Y = [1,1,0,1,1]
np.corrcoef(X,Y) 
Out[57]:
array([[ 1.        , -0.61237244],
       [-0.61237244,  1.        ]])
In [58]:

V = np.sqrt(np.array([np.var(X, ddof=1), np.var(Y, ddof=1)])).reshape(1,-1)
np.matrix(np.cov(X,Y))
Out[58]:
matrix([[ 0.3 , -0.15],
        [-0.15,  0.2 ]])
In [59]:

np.matrix(np.cov(X,Y))/(V*V.T)
Out[59]:
matrix([[ 1.        , -0.61237244],
        [-0.61237244,  1.        ]])

Or looks it the otherway:

In [70]:

V=np.diag(np.cov(X,Y)).reshape(1,-1) #the diagonal elements
In [71]:

np.matrix(np.cov(X,Y))/np.sqrt(V*V.T)
Out[71]:
matrix([[ 1.        , -0.61237244],
        [-0.61237244,  1.        ]])

What is really going on, np.cov(m, y=None, rowvar=1, bias=0, ddof=None), when bias and ddof both not provided, the default normalization is by N-1, N being the number of observation. So, that is equivalent to have delta degrees of freedom of 1. Unfortunately, the default for np.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False) has the default delta degrees of freedom of 0.

Whenever unsure, the safest way is to grab the diagonal elements of the covariance matrix rather than calculate var separately, to ensure consistent behavior.

Другие советы

According to your link (http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html) you need to be mindful of the indices...

c = np.cov([0,0,1,1,0],[1,1,0,1,1])
corrcoef = [[ c[0,0]/np.sqrt(c[0,0]*c[0,0]), c[0,1]/np.sqrt(c[0,0]*c[1,1]) ],
           [ c[1,0]/np.sqrt(c[1,1]*c[0,0]), c[1,1]/np.sqrt(c[1,1]*c[1,1]) ]]

print corrcoef
# [[1.0, -0.61237243569579447], [-0.61237243569579447, 1.0]]

It's right!

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top