Comment est la réduction de dimensionnalité réalisée dans les réseaux de croyance profonde avec accès restreint Boltzmann Machines?

https://datascience.stackexchange.com/questions/15163

16-10-2019
|

Question

Dans les réseaux et les anciennes méthodes de classification de neurones, nous construisons habituellement une fonction objective pour obtenir une réduction de dimensionnalité. Mais les réseaux profonds de croyance (DBN) avec des machines restreint Boltzmann (de RBM) apprendre la structure de données grâce à l'apprentissage non supervisé. Comment cela parvenir à réduire la dimensionnalité sans connaître la vérité du terrain et la construction d'une fonction objective?

La solution

As you know, a deep belief network (DBN) is a stack of restricted Boltzmann machines (RBM), so let's look at the RBM: a restricted Boltzmann machines is a generative model, which means it is able to generate samples from the learned probability distribution at the visible units (the input). While training the RBM, you teach it how your input samples are distributed, and the RBM learns how it could generate such samples. It can do so by adjusting the visible and hidden biases, and the weights in between.

The choice of the number of hidden units is completely up to you: if you choose to give it less hidden than visible units, the RBM will try to recreate the probability distribution at the input with only the number of hidden units it has. An that is already the objective: $p(\mathbf{v})$, the probability distribution at the visible units, should be as close as possible to the probability distribution of your data $p(\text{data})$.

To do that, we assign an energy function (both equations taken from A Practical Guide to Training RBMs by G. Hinton) $$E(\mathbf{v},\mathbf{h}) = -\sum_{i \in \text{visible}} a_i v_i - \sum_{j \in \text{hidden}} b_j h_j - \sum_{i,j} v_i h_j w_{ij}$$ to each configuration of visible units $\mathbf{v}$ and hidden units $\mathbf{h}$. Here, $a_i$ and $b_j$ are the biases, and $w_{ij}$ are the weights. Given this energy function, the probability of a visible vector $\mathbf{v}$ is $$p(\mathbf{v}) = \frac 1Z \sum_{\mathbf{h}} e^{-E(\mathbf{v},\mathbf{h})}$$ With that, we know that to increase the probability of the RBM generating a training sample $\mathbf{v}^{(k)}$ (denotes the $k$-th training sample), we need to change $a_i$, $b_j$ and $w_{ij}$ so that the energy $E$ for our given $\mathbf{v}^{(k)}$ and the corresponding $\mathbf{h}$ gets lower.

Licencié sous: CC-BY-SA avec attribution

Non affilié à datascience.stackexchange