Isn't computing the “tractable error” in Restricted Boltzmann Machines (RBM) intractable?

https://datascience.stackexchange.com/questions/30490

31-10-2019
|

Question

Let $v \in \{0,1\}^M$ be the visible layer, $h \in \{0,1\}^N$ be the hidden layer, where $M$ and $N$ are natural numbers. Given the biases $b \in \Re^M$, $c \in \Re^N$ and weights $W \in \Re^{M \times N}$, the energy and probability of an RBM is given by:

Energy, $E(v,h; b,c,W) = -b^T v - c^T h - h^T W^T v$

Probability, $p(v, h; b, c, W) = \frac{e^{-E(v,h; b,c,W)}}{Z}$

where $Z = \sum_{v,h} e^{-E(v,h; b,c,W)}$

The negative log likelihood error for a Restricted Boltzmann Machine (RBM) is given by:

$\mathcal{L}(b,c,W) = \frac{1}{T} \sum_{t=1}^{T} \left( -\log \sum_{h} e^{-E(v^t, h; b,c,W)} \right) + \log Z$

where:

$T$ is size of training dataset; and

$v^t$ represents $t^{th}$ data point in the training dataset

It is clear that computing $Z$ (and hence $\log Z$) is intractable because we have to sum over $2^{M+N}$ configurations of $v$ and $h$ - exponential time algorithm.

However, shouldn't computing $\sum_{h} e^{-E(v^t, h; b,c,W)}$ be intractable as well? Aren't we summing over all the $2^N$ configurations of $h$ here? If say $N = 64$, then we are already reaching exascale computations ($2^{64} = 1.84 \times 10^{19}$)!

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange