質問

For data X = [0,0,1,1,0]and Y = [1,1,0,1,1]

>> np.corrcoef(X,Y) 

returns

array([[ 1.        , -0.61237244],
       [-0.61237244,  1.        ]])

However, I cannot reproduce this result using np.var and np.cov given the equation shown in http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html:

>> np.cov([0,0,1,1,0],[1,1,0,1,1])/sqrt(np.var([0,0,1,1,0])*np.var([1,1,0,1,1]))

array([[ 1.53093109, -0.76546554],
       [-0.76546554,  1.02062073]])

What's going on here?

役に立ちましたか?

解決

This is because, np.var default delta degrees of freedom is 0, not 1.

In [57]:

X = [0,0,1,1,0]
Y = [1,1,0,1,1]
np.corrcoef(X,Y) 
Out[57]:
array([[ 1.        , -0.61237244],
       [-0.61237244,  1.        ]])
In [58]:

V = np.sqrt(np.array([np.var(X, ddof=1), np.var(Y, ddof=1)])).reshape(1,-1)
np.matrix(np.cov(X,Y))
Out[58]:
matrix([[ 0.3 , -0.15],
        [-0.15,  0.2 ]])
In [59]:

np.matrix(np.cov(X,Y))/(V*V.T)
Out[59]:
matrix([[ 1.        , -0.61237244],
        [-0.61237244,  1.        ]])

Or looks it the otherway:

In [70]:

V=np.diag(np.cov(X,Y)).reshape(1,-1) #the diagonal elements
In [71]:

np.matrix(np.cov(X,Y))/np.sqrt(V*V.T)
Out[71]:
matrix([[ 1.        , -0.61237244],
        [-0.61237244,  1.        ]])

What is really going on, np.cov(m, y=None, rowvar=1, bias=0, ddof=None), when bias and ddof both not provided, the default normalization is by N-1, N being the number of observation. So, that is equivalent to have delta degrees of freedom of 1. Unfortunately, the default for np.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False) has the default delta degrees of freedom of 0.

Whenever unsure, the safest way is to grab the diagonal elements of the covariance matrix rather than calculate var separately, to ensure consistent behavior.

他のヒント

According to your link (http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html) you need to be mindful of the indices...

c = np.cov([0,0,1,1,0],[1,1,0,1,1])
corrcoef = [[ c[0,0]/np.sqrt(c[0,0]*c[0,0]), c[0,1]/np.sqrt(c[0,0]*c[1,1]) ],
           [ c[1,0]/np.sqrt(c[1,1]*c[0,0]), c[1,1]/np.sqrt(c[1,1]*c[1,1]) ]]

print corrcoef
# [[1.0, -0.61237243569579447], [-0.61237243569579447, 1.0]]

It's right!

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top