Pregunta

I would like to be able to construct the scores of a principal component analysis using its loadings, but I cannot figure out what the princomp function is actually doing when it computes the scores of a dataset. A toy example:

cc <- matrix(1:24,ncol=4)
PCAcc <- princomp(cc,scores=T,cor=T)
PCAcc$loadings

Loadings:
     Comp.1 Comp.2 Comp.3 Comp.4
[1,]  0.500  0.866              
[2,]  0.500 -0.289  0.816       
[3,]  0.500 -0.289 -0.408 -0.707
[4,]  0.500 -0.289 -0.408  0.707

PCAcc$scores

       Comp.1        Comp.2        Comp.3 Comp.4
[1,] -2.92770 -6.661338e-16 -3.330669e-16      0
[2,] -1.75662 -4.440892e-16 -2.220446e-16      0
[3,] -0.58554 -1.110223e-16 -6.938894e-17      0
[4,]  0.58554  1.110223e-16  6.938894e-17      0
[5,]  1.75662  4.440892e-16  2.220446e-16      0
[6,]  2.92770  6.661338e-16  3.330669e-16      0

My understanding is that the scores are a linear combination of the loadings and the original data rescaled. Trying by "hand":

rescaled <- t(t(cc)-apply(cc,2,mean))
rescaled%*%PCAcc$loadings

     Comp.1        Comp.2        Comp.3 Comp.4
[1,]     -5 -1.332268e-15 -4.440892e-16      0
[2,]     -3 -6.661338e-16 -3.330669e-16      0
[3,]     -1 -2.220446e-16 -1.110223e-16      0
[4,]      1  2.220446e-16  1.110223e-16      0
[5,]      3  6.661338e-16  3.330669e-16      0
[6,]      5  1.332268e-15  4.440892e-16      0

The columns are off by a factor of 1.707825, 2, and 1.333333, respectively. Why is this? Since the toy data matrix has the same variance in each column, normalization shouldn't be necessary here. Any help is greatly appreciated.

Thanks!

No hay solución correcta

Otros consejos

You need

scale(cc,PCAcc$center,PCAcc$scale)%*%PCAcc$loadings

or easier

predict(PCAcc,newdata=cc)
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top