Question

I'm getting a memory error trying to do KernelPCA on a data set of 30.000 texts. RandomizedPCA works alright. I think what's happening is that RandomizedPCA works with sparse arrays and KernelPCA don't.

Does anyone have a list of learning methods that are currently implemented with sparse array support in scikits-learn?

Was it helpful?

Solution

We don't have that yet. You have to read the docstrings of the individual classes for now.

Anyway, non linear models do not tend to work better than linear model for high dim sparse data such as text documents (and they can overfit more easily).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top