This is because, np.var
default delta degrees of freedom is 0
, not 1
.
In [57]:
X = [0,0,1,1,0]
Y = [1,1,0,1,1]
np.corrcoef(X,Y)
Out[57]:
array([[ 1. , -0.61237244],
[-0.61237244, 1. ]])
In [58]:
V = np.sqrt(np.array([np.var(X, ddof=1), np.var(Y, ddof=1)])).reshape(1,-1)
np.matrix(np.cov(X,Y))
Out[58]:
matrix([[ 0.3 , -0.15],
[-0.15, 0.2 ]])
In [59]:
np.matrix(np.cov(X,Y))/(V*V.T)
Out[59]:
matrix([[ 1. , -0.61237244],
[-0.61237244, 1. ]])
Or looks it the otherway:
In [70]:
V=np.diag(np.cov(X,Y)).reshape(1,-1) #the diagonal elements
In [71]:
np.matrix(np.cov(X,Y))/np.sqrt(V*V.T)
Out[71]:
matrix([[ 1. , -0.61237244],
[-0.61237244, 1. ]])
What is really going on, np.cov(m, y=None, rowvar=1, bias=0, ddof=None)
, when bias
and ddof
both not provided, the default normalization is by N-1
, N being the number of observation. So, that is equivalent to have delta degrees of freedom of 1
. Unfortunately, the default for np.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)
has the default delta degrees of freedom of 0
.
Whenever unsure, the safest way is to grab the diagonal elements of the covariance matrix rather than calculate var
separately, to ensure consistent behavior.