Numpy - correlation coefficient and related statistical functions don't give same results

Question 1

This is because, np.var default delta degrees of freedom is 0, not 1.

In [57]:

X = [0,0,1,1,0]
Y = [1,1,0,1,1]
np.corrcoef(X,Y) 
Out[57]:
array([[ 1.        , -0.61237244],
       [-0.61237244,  1.        ]])
In [58]:

V = np.sqrt(np.array([np.var(X, ddof=1), np.var(Y, ddof=1)])).reshape(1,-1)
np.matrix(np.cov(X,Y))
Out[58]:
matrix([[ 0.3 , -0.15],
        [-0.15,  0.2 ]])
In [59]:

np.matrix(np.cov(X,Y))/(V*V.T)
Out[59]:
matrix([[ 1.        , -0.61237244],
        [-0.61237244,  1.        ]])

Or looks it the otherway:

In [70]:

V=np.diag(np.cov(X,Y)).reshape(1,-1) #the diagonal elements
In [71]:

np.matrix(np.cov(X,Y))/np.sqrt(V*V.T)
Out[71]:
matrix([[ 1.        , -0.61237244],
        [-0.61237244,  1.        ]])

What is really going on, np.cov(m, y=None, rowvar=1, bias=0, ddof=None), when bias and ddof both not provided, the default normalization is by N-1, N being the number of observation. So, that is equivalent to have delta degrees of freedom of 1. Unfortunately, the default for np.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False) has the default delta degrees of freedom of 0.

Whenever unsure, the safest way is to grab the diagonal elements of the covariance matrix rather than calculate var separately, to ensure consistent behavior.

Question 2

According to your link (http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html) you need to be mindful of the indices...

c = np.cov([0,0,1,1,0],[1,1,0,1,1])
corrcoef = [[ c[0,0]/np.sqrt(c[0,0]*c[0,0]), c[0,1]/np.sqrt(c[0,0]*c[1,1]) ],
           [ c[1,0]/np.sqrt(c[1,1]*c[0,0]), c[1,1]/np.sqrt(c[1,1]*c[1,1]) ]]

print corrcoef
# [[1.0, -0.61237243569579447], [-0.61237243569579447, 1.0]]

It's right!