PCA decomposition with python: features relevances

https://stackoverflow.com/questions/22348668

13-06-2023
|

Pergunta

I'm following now next topic: How can I use PCA/SVD in Python for feature selection AND identification? Now, we decompose our data set in Python with PCA method and use for this the sklearn.decomposition.PCA With the usage of attributes components_ we get all components. Now we have very similar goal: want take only first several components (this part is not a problem) and see, what the input features proportions has every PCA component (to know, which features are much important for us). How is possible to do it? Another question is, has the python lybrary another implementations of Principal Component Analysis?

Solução

what the input features proportions has every PCA component (to know, which features are much important for us). How is possible to do it?

The components_ array has shape (n_components, n_features) so components_[i, j] is already giving you the (signed) weights of the contribution of feature j to component i.

If you want to get the indices of the top 3 features contributing to component i irrespective of the sign, you can do:

numpy.abs(pca.component_[i]).argsort()[::-1][:3]

Note: the [::-1] notation makes it possible to reverse the order of an array:

>>> import numpy as np
>>> np.array([1, 2, 3])[::-1]
array([3, 2, 1])

Another question is, has the python library another implementations of Principal Component Analysis?

PCA is just a truncated Singular Value Decomposition of the centered dataset. You can use numpy.linalg.svd directly if you wish. Have a look at the soure code of the scikit-learn implementation of PCA for details.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow