Question

I am working with a dataset of 10000 data points and 100 variables in R. Unfortunately the variables I have do not describe the data in a good way. I carried out a PCA analysis using prcomp() and the first 3 PCs seem to account for a most of the variability of the data. As far as I understand, a principal component is a combination of different variables; therefore it has a certain value corresponding to each data point and can be considered as a new variable. Would I be able to add these principal components as 3 new variables to my data? I would need them for further analysis.

A reproducible dataset:

set.seed(144)
x <- data.frame(matrix(rnorm(2^10*12), ncol=12))
y <- prcomp(formula = ~., data=x, center = TRUE, scale = TRUE, na.action = na.omit)
Était-ce utile?

La solution

PC scores are stored in the element x of prcomp() result.

str(y)

List of 6
 $ sdev    : num [1:12] 1.08 1.06 1.05 1.04 1.03 ...
 $ rotation: num [1:12, 1:12] -0.0175 -0.1312 0.3284 -0.4134 0.2341 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:12] "X1" "X2" "X3" "X4" ...
  .. ..$ : chr [1:12] "PC1" "PC2" "PC3" "PC4" ...
 $ center  : Named num [1:12] 0.02741 -0.01692 -0.03228 -0.03303 0.00122 ...
  ..- attr(*, "names")= chr [1:12] "X1" "X2" "X3" "X4" ...
 $ scale   : Named num [1:12] 0.998 1.057 1.019 1.007 0.993 ...
  ..- attr(*, "names")= chr [1:12] "X1" "X2" "X3" "X4" ...
 $ x       : num [1:1024, 1:12] 1.023 -1.213 0.167 -0.118 -0.186 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:1024] "1" "2" "3" "4" ...
  .. ..$ : chr [1:12] "PC1" "PC2" "PC3" "PC4" ...
 $ call    : language prcomp(formula = ~., data = x, na.action = na.omit, center = TRUE, scale = TRUE)
 - attr(*, "class")= chr "prcomp"

You can get them with y$x and then chose those columns you need.

x.new<-cbind(x,y$x[,1:3])
str(x.new)

'data.frame':   1024 obs. of  15 variables:
 $ X1 : num  1.14 2.38 0.684 1.785 0.313 ...
 $ X2 : num  -0.689 0.446 -0.72 -3.511 0.36 ...
 $ X3 : num  0.722 0.816 0.295 -0.48 0.566 ...
 $ X4 : num  1.629 0.738 0.85 1.057 0.116 ...
 $ X5 : num  -0.737 -0.827 0.65 -0.496 -1.045 ...
 $ X6 : num  0.347 0.056 -0.606 1.077 0.257 ...
 $ X7 : num  -0.773 1.042 2.149 -0.599 0.516 ...
 $ X8 : num  2.05511 0.4772 0.18614 0.02585 0.00619 ...
 $ X9 : num  -0.0462 1.3784 -0.2489 0.1625 0.6137 ...
 $ X10: num  -0.709 0.755 0.463 -0.594 -1.228 ...
 $ X11: num  -1.233 -0.376 -2.646 1.094 0.207 ...
 $ X12: num  -0.44 -2.049 0.315 0.157 2.245 ...
 $ PC1: num  1.023 -1.213 0.167 -0.118 -0.186 ...
 $ PC2: num  1.2408 0.6077 1.1885 3.0789 0.0797 ...
 $ PC3: num  -0.776 -1.41 0.977 -1.343 0.987 ...

Autres conseils

Didzis Elferts's response only works if your data, x, has no NAs. Here's how you can add the components if your data does have NAs.

library(tidyverse)

components <- y$x %>% rownames_to_column("id")

x <- x %>% rownames_to_column("id") %>% left_join(components, by = "id")
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top