Question

I'm working with a large dataset (about 50K observations x 11K features) and I'd like to reduce the dimensionality. This will eventually be used for multi-class classification, so I'd like to extract features that are useful for separating the data. Thus far, I've tried PCA (performed OK with an overall accuracy in Linear SVM of about 70%), LDA (performed with very high training accuracy of about 96% but testing accuracy was about 61%), and an autoencoder (3 layer dense encoder with 13000 - 1000 - 136 units, respectively, which performed about the same as PCA). I've been asked to try a Deep Belief Network (stack of Restricted Boltzmann Machines) in this problem.

Thus far, I foresee two challenges. First, I have access to a GPU that can be used, but I don't see many implementation of DBNs from the major players in the neural net community (e.g., TensorFlow/Keras, PyTorch), which means that this will need to be implemented on a CPU, bringing up challenge number two. Second, implementing this will require significant memory and will be pretty slow. This brings up my question: Are there any implementations of DBN autoencoder in Python (or R) that are trusted and, optimally, utilize GPU? If not, what is the preferred method of constructing a DBN in Python? Should I use sklearn?

Was it helpful?

Solution

Unlike Autoencoders, Botzmann Machines (restricted or not) do not have an output layer and thus classified as deep generative models.

There is a variety of implementations in Pytorch. This one is GPU compatible (https://github.com/GabrielBianconi/pytorch-rbm) and I have found it particularly helpful in the past.

RBMs can come quite handy in a variety of tasks such as

  • Dimensionality reduction
  • Collaborative filtering for recommender systems
  • Feature learning and others.

This was an interesting read in case you want to find out more about RBMs. https://heartbeat.fritz.ai/guide-to-restricted-boltzmann-machines-using-pytorch-ee50d1ed21a8

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top