Understanding Contrastive Divergence

https://datascience.stackexchange.com/questions/30186

31-10-2019
|

Question

I’m trying to understand, and eventually build a Restricted Boltzmann Machine. I understand that the update rule - that is the algorithm used to change the weights - is something called “contrastive divergence”. I looked this up on Wikipedia and found these steps:

Take a training sample v, compute the probabilities of the hidden units and sample a hidden activation vector h from this probability distribution.
Compute the outer product of v and h and call this the positive gradient.
From h, sample a reconstruction v' of the visible units, then resample the hidden activations h' from this. (Gibbs sampling step)
Compute the outer product of v' and h' and call this the negative gradient.
...

I don’t understand step 3 and I’m struggling to grasp the concept of Gibbs sampling. Would someone explain this simply to me? I have covered neural networks if that helps you.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange