سؤال

I would like to be able to construct the scores of a principal component analysis using its loadings, but I cannot figure out what the princomp function is actually doing when it computes the scores of a dataset. A toy example:

cc <- matrix(1:24,ncol=4)
PCAcc <- princomp(cc,scores=T,cor=T)
PCAcc$loadings

Loadings:
     Comp.1 Comp.2 Comp.3 Comp.4
[1,]  0.500  0.866              
[2,]  0.500 -0.289  0.816       
[3,]  0.500 -0.289 -0.408 -0.707
[4,]  0.500 -0.289 -0.408  0.707

PCAcc$scores

       Comp.1        Comp.2        Comp.3 Comp.4
[1,] -2.92770 -6.661338e-16 -3.330669e-16      0
[2,] -1.75662 -4.440892e-16 -2.220446e-16      0
[3,] -0.58554 -1.110223e-16 -6.938894e-17      0
[4,]  0.58554  1.110223e-16  6.938894e-17      0
[5,]  1.75662  4.440892e-16  2.220446e-16      0
[6,]  2.92770  6.661338e-16  3.330669e-16      0

My understanding is that the scores are a linear combination of the loadings and the original data rescaled. Trying by "hand":

rescaled <- t(t(cc)-apply(cc,2,mean))
rescaled%*%PCAcc$loadings

     Comp.1        Comp.2        Comp.3 Comp.4
[1,]     -5 -1.332268e-15 -4.440892e-16      0
[2,]     -3 -6.661338e-16 -3.330669e-16      0
[3,]     -1 -2.220446e-16 -1.110223e-16      0
[4,]      1  2.220446e-16  1.110223e-16      0
[5,]      3  6.661338e-16  3.330669e-16      0
[6,]      5  1.332268e-15  4.440892e-16      0

The columns are off by a factor of 1.707825, 2, and 1.333333, respectively. Why is this? Since the toy data matrix has the same variance in each column, normalization shouldn't be necessary here. Any help is greatly appreciated.

Thanks!

لا يوجد حل صحيح

نصائح أخرى

You need

scale(cc,PCAcc$center,PCAcc$scale)%*%PCAcc$loadings

or easier

predict(PCAcc,newdata=cc)
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top