Inferring missing data with Restricted Boltzmann Machines

Question 1

https://www.youtube.com/watch?v=laVC6WFIXjg , maybe this video will be of some help.

I think that sampling after imputing random values is a good idea. Hinton justifies this in this video. Also you can try to estimate prior, or to do many samples, or to make guesses based on some different method and then do the reconstruction.

In the video Hinton says that this method isn't very accurate indeed on itself, but when combined with matrix factorization (or other similar methods) can be very powerful.

Question 2

The idea is to perform alternating Gibbs sampling but keeping the non-missing values fixed to the data values in the reconstruction update. Doing this until the missing values reach a stationary distribution in their Markov Chains and you know what the network's best guess as to what they ought to be is.

Question 3

Actually the dependency on the initial values given to these missing visible nodes can be used to get some extra 2-5% of accuracy. You can run the RBM several times under different initializations and then average the results. Every ending state will get errors but they'll be different form each other. I tried it and kept improving it until the +/-20th initialization...