How to use RBM for classification?

https://datascience.stackexchange.com/questions/14568

16-10-2019
|

Question

At the moment I'm playing with Restricted Boltzmann Machines and since I'm at it I would like try to classify handwritten digits with it.

The model I created is now a quite fancy generative model but I don't know how to go further with it.

In this article the author say, that after creating a good generative model, one "then trains a discriminative classifier (i.e., linear classifier, Support Vector Machine) on top of the RBM using the labelled samples" and furtherly states "since you propagate the data vectors to the hidden units of the RBM model to get hidden unit vectors, or a higher-level representation of the data". The problem is that I'm not sure if I get that right.

Does that mean all I have to do is propagate the input to the hidden units and there I have my RBM feature for classification?

Can somebody explain this process to me?

Solution

Review of Restricted Boltzmann Machines

A restricted Boltzmann machine (RBM) is a generative model, which learns a probability distribution over the input. That means, after being trained, the RBM can generate new samples from the learned probability distribution. The probability distribution over the visible units $\mathbf{v}$ is given by $$p(\mathbf{v} \mid \mathbf{h}) = \prod_{i=0}^V p(v_i \mid \mathbf{h}),$$ where $$p(v_i \mid \mathbf{h}) = \sigma\left( a_i + \sum_{j=0}^H w_{ji} h_j \right)$$ and $\sigma$ is the sigmoid function, $a_i$ is the bias for the visible node $i$, and $w_{ji}$ is the weight from $h_j$ to $v_i$. From these two equations, it follows that $p(\mathbf{v} \mid \mathbf{h})$ only depends on the hidden states $\mathbf{h}$. That means that the information on how a visible sample $\mathbf{v}$ is generated, has to be stored in the hidden units, the weights and the biases.

Using RBMs for classification

When using RBMs for classification tasks, you use the following idea: as the information on how your training or test data was generated is saved in the hidden units $\mathbf{h}$, you can extract these underlying factors by feeding a training sample into the visible units of the RBM, propagate it forward to the hidden units, and use this vector of hidden units as a feature vector. You don't do any backwards pass to the visible units anymore.

This hidden vector is just a transformed version of the input data - this can not classify anything by itself. To do a classification, you would train any classifier (linear classifier, SVM, a feedforward neural network, or anything else) with the hidden vector instead of the "raw" training data as inputs.

If you are building a deep belief network (DBN) - which was used to pre-train deep feed-forward neural networks in an unsupervised fashion - you would take this hidden vector and use it as the input to a new RBM, which you stack on top of it. That way, you can train the network layer-by-layer until reaching the desired size, without needing any labeled data. Finally, you'd add e.g. a softmax layer to the top, and train the whole network with backpropagation on your classification task.

OTHER TIPS

@hbaderts described the whole workflow perfectly. However, it may not make any sense in case you are completely new to this idea. Therefore, I am going to explain it in layman's way (therefore, I will be omitting details):

Think of the deep networks as a function to transform your data. Example of transformations include normalization, taking log of data etc. The deep networks you are training has multiple layers. Each of these layers are trained using some kind of learning algorithm. For the first layer, you pass the original data as the input and try to get a function that will give you back those "same original data" as the output. However, you don't get the perfect output. Therefore, you are getting a transformed version of your input as the output of the first layer.

Now, for the second layer, you take those "transformed data" and pass them as input and repeat the whole learning process. You keep doing that for all the layers in your deep network.

At the last layer, what you get is a "transformed version" of your original input data. This can be thought of higher level abstraction of your original input data. Note that, you have not used the labels/output in your deep network yet. Therefore, everything till this point is unsupervised learning. This is called layer-wise pre-training.

Now, you want to train a classifier/regression model and this is a supervised learning problem. The way you achieve that goal is by taking the "final transformed version" of your original input from the last layer in your deep network and use them as the input to any classifier (e.g. knn classifier / softmax classifier / logistic regression etc). This is called stacking.

When you are training this last-step classifier/learner, you propagate all your learning in the complete network. This ensures that your are able to learn from the labels/outputs and modify the learned layer-wise parameters accordingly.

So, once you have your generative model trained, take the output of your generative model and use that as input to a classifier/learner. Let the error flow through the whole network as the learning continues so that you can modify the layer-wise parameter learned in earlier steps.

You can train stacked RBMs on your images, and then train the final RBM on a concatenation of the output from the RBM stack and the labels. Then you can actually use the RBM for classification. This article by Hinton ++ explains this approach A Fast Learning Algorithm for Deep Belief Nets , also you can have a look at this demo

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange