How hidden layer is made binary in Restricted Boltzmann Machine (RBM)?

https://datascience.stackexchange.com/questions/15657

16-10-2019
|

Question

In RBM, in the positive phase for updating the hidden layer(which should also be binary), [Acually consider a node of h1 ∈ H(hidden layer vector)] to make h1 a binary number we compute the probability of turning on a hidden unit by operating activation function over total input (after the activation function operation, we would be getting values in the range between 0 and 1, since activation function I am using - sigmoid). My doubt is that how do we make it binary by leveraging the probability computed. I don't think if P>=0.5, make it 1 else 0 is a proper method to work on.

By few literature reviews, I found this document (by Hinton), in section 3.1: he has stated "the hidden unit turns on if this probability is greater than a random number uniformly distributed between 0 and 1". What does this actually mean? And also in this link, they say "Then the jth unit is on if upon choosing s uniformly distributed random number between 0 and 1 we find that its value is less than sig[j]. Otherwise it is off." I actually didn't get this. Whether the random number generated is same for all h ∈ H ? Another query is, what about the random in next sampling iteration?

I saw this video. Just watch the video from that point as per the link. How do you get that sampled number? Whether we have to just run rand() in Matlab and obtain it? Should it would be different for each h(i) (oh nooo! I don't think the machine will learn properly)? Whether the random number should be different for each iteration or the same random number can be used for all iteration to compare?

Solution

As you correctly say, we calculate the probability of a hidden unit $h_j$ being one and then make it binary. That probability is given by $$p(h_j=1) = \sigma\left(b_j + \sum_{i=1}^V w_{ij}v_i \right)$$ where $\sigma$ is the sigmoid function, $b_j$ is the bias of hidden unit $h_j$, $V$ is the number of visible units, $v_i$ is the (binary!) state of visible unit $i$, and $w_{ij}$ are the weights.

So, your MATLAB code for obtaining the probabilities hidden_probs is something like this (we write the sum implicitly by making a matrix multiplication):

hidden_probs = sigmoid(hidden_bias + data * weights)

Now, we have the probability $p(h_j=1)$ for each hidden unit $j \in [1,H]$. Now, this is only a probability. And we need a binary number, either 0 or 1. So the only thing we can do is pick a random sample from the probability distribution of $h_j$, which is a Bernoulli distribution.

As all hidden units are independent, we need to get one sample for each hidden unit independently. And also, in each training step, we need to draw new samples.

To draw these samples from the Bernoulli distribution, you can use the built-in functions of e.g. MATLAB (binornd) or Python (numpy.random.binomial). Note that these functions are to sample from a binomial distribution, but the Bernoulli distribution is just a special case of the binomial distribution with N=1. In MATLAB, that would be something like

hidden_states = binornd(1, hidden_probs)

which would create vector hidden_states which contains either 0 or 1, drawn randomly for each probability in hidden_probs.

As you probably have noticed, nobody does that! E.g. describes it in his Practical Guide to Training RBMs, as

the hidden unit turns on if this probability is greater than a random number uniformly distributed between 0 and 1.

That is exactly what Hinton does in his RBM code: he gets a random number for each hidden unit using rand, i.e. randomly sampled from the uniform distribution between [0,1]. He then does the comparison:

hidden_states = hidden_probs > rand(1, H)

This is equivalent to using binornd, but is probably faster. For example to generate a random number that is 1 with p=0.9, you get a random number from [0,1]. Now, in 90% of the cases, this random number is smaller than 0.9, and in 10% of the cases it is larger than 0.9. So to get a random number that is 1 with p=0.9, you can call 0.9 > rand(1) - which is exactly what they do.

tl;dr: Pick a new random number from the range [0,1] for each hidden unit in each iteration. Compare it to your probability with hidden_probs > rand(1,H) to make it binary.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange