Generative Adversarial Text to Image Synthesis

https://datascience.stackexchange.com/questions/77369

12-12-2020
|

Question

Can anyone explain the meaning of this line: "Deep networks have been shown to learn representations in which interpolations between embedding pairs tend to be near the data manifold".

Reference: Section 4.3 of the paper Generative Adversarial Text to Image Synthesis

Solution

There are several important concepts in this sentence so let's break them down.

"Data manifold"

This is a usual model we use in machine learning problem, that is to consider real data as a manifold. I suggest to you this article by Christopher Olah on the topic.

"Interpolation between embedding pairs"

One way of generating new data is to sample the embedding space learned by the neural network. For example, you can take two real data samples, compute their embedding, interpolate them to obtain an intermediate embedding, then see the output of your neural network when fed with this intermediate embedding.

"Interpolations between embedding pairs tend to be near the data manifold"

Following the previous step, the output of your neural network should be realistic. In mathematical terms, it should be near the real data manifold.

This is the main point of generative models such as generative adversarial networks or variational autoencoders. They learn to fit some random distribution, usually a gaussian, to the real data distribution, and they learn to "convert" noise to real data and vice versa.

This is often referred to as disentanglement. As explained in the first paper cited by the article after the sentence,

deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation.

In other words, we can obtain deep embeddings that isolate the factors of variation of real data. Ideally, embeddings of human faces can isolate the axis that controls color of the hair, expression of the mouth, etc. (like TL-GAN for example). But these factors of variations are not always as easily explainable.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange