PCA for complex-valued data

https://datascience.stackexchange.com/questions/75733

12-12-2020
|

Question

I'm quite shocked for encountering this error on PCA from sklearn

ValueError: Complex data not supported

After trying to fit complex-valued data. Is this just unimplemented thing? Should I just go ahead and do it 'manually' with SVD or is their a catch for complex-values?

Solution

Apparently this functionality is left out intentionally, see here. I'm afraid you have to use SVD, but that should be fairly straightforward:

def pca(X):
    mean = X.mean(axis=0) 
    center = X - mean 
    _, stds, pcs = np.linalg.svd(center/np.sqrt(X.shape[0])) 

    return stds**2, pcs

OTHER TIPS

My implementation exactly mimicks the original PCA so any existing code that deals with PCA would work seamlessly.

class ComplexPCA:
    def __init__(self, n_components):
        self.n_components = n_components
        self.u = self.s = self.components_ = None
        self.mean_ = None

    @property
    def explained_variance_ratio_(self):
        return self.s

    def fit(self, matrix, use_gpu=False):
        self.mean_ = matrix.mean(axis=0)
        if use_gpu:
            import tensorflow as tf  # torch doesn't handle complex values.
            tensor = tf.convert_to_tensor(matrix)
            u, s, vh = tf.linalg.svd(tensor, full_matrices=False)  # full=False ==> num_pc = min(N, M)
            # It would be faster if the SVD was truncated to only n_components instead of min(M, N)
        else:
            _, self.s, vh = np.linalg.svd(matrix, full_matrices=False)  # full=False ==> num_pc = min(N, M)
            # It would be faster if the SVD was truncated to only n_components instead of min(M, N)
        self.components_ = vh  # already conjugated.
        # Leave those components as rows of matrix so that it is compatible with Sklearn PCA.

    def transform(self, matrix):
        data = matrix - self.mean_
        result = data @ self.components_.T
        return result

    def inverse_transform(self, matrix):
        result = matrix @ np.conj(self.components_)
        return self.mean_ + result

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange