문제

Let's generate an array:

import numpy as np

data = np.arange(30).reshape(10,3)
data=data*data
array([[  0,   1,   4],
       [  9,  16,  25],
       [ 36,  49,  64],
       [ 81, 100, 121],
       [144, 169, 196],
       [225, 256, 289],
       [324, 361, 400],
       [441, 484, 529],
       [576, 625, 676],
       [729, 784, 841]])

Then find the eigenvalues of the covariance matrix:

mn = np.mean(data, axis=0)
data -= mn
C = np.cov(data.T)
evals, evecs = la.eig(C)
idx = np.argsort(evals)[::-1]
evecs = evecs[:,idx]
print evecs
array([[-0.53926461, -0.73656433,  0.40824829],
       [-0.5765472 , -0.03044111, -0.81649658],
       [-0.61382979,  0.67568211,  0.40824829]])

Now let's run the matplotlib.mlab.PCA function on the data:

import matplotlib.mlab as mlab
mpca=mlab.PCA(data)
print mpca.Wt
[[ 0.57731894  0.57740574  0.57732612]
 [ 0.72184459 -0.03044628 -0.69138514]
 [ 0.38163232 -0.81588947  0.43437443]]

Why are the two matrices different? I thought that in finding the PCA, first one had to find the eigenvectors of the covariance matrix, and that this would be exactly equal to the weights.

도움이 되었습니까?

해결책

You need to normalize your data, not just center it, and the output of np.linalg.eig has to be transposed to match that of mlab.PCA:

>>> n_data = (data - data.mean(axis=0)) / data.std(axis=0)
>>> evals, evecs = np.linalg.eig(np.cov(n_data.T))
>>> evecs = evecs[:, np.argsort(evals)[::-1]].T
>>> mlab.PCA(data).Wt
array([[ 0.57731905,  0.57740556,  0.5773262 ],
       [ 0.72182079, -0.03039546, -0.69141222],
       [ 0.38167716, -0.8158915 ,  0.43433121]])
>>> evecs
array([[-0.57731905, -0.57740556, -0.5773262 ],
       [-0.72182079,  0.03039546,  0.69141222],
       [ 0.38167716, -0.8158915 ,  0.43433121]])
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top