Does it make sense to do train test split when trainning GANS?

https://datascience.stackexchange.com/questions/66697

21-10-2020
|

Pergunta

For normal supervised learning the dataset is split in train and test (let's keep it simple).

Generative Adversarial Networks are unsupervised learning but there is a supervised loss function in the discriminator.

Does it make sense to split the data into train and test when training GANs?

My first opinion will be no, but I am not 100% sure. Is there any reason why having a test set will help the Generator?

Solução

The purpose of the test split is normally to evaluate the performance of your model in data it has not seen before.

While the available performance measures for GAN generators have their problems, they do exist. For images, you have Inception Score and Frechet Inception Distance. For text, you have quality vs. diversity plots.

The evaluation measures mentioned above evaluate some aspects of the generated samples against real data. In order to evaluate the performance of a GAN generator, you should use data it has not seen before, i.e. a test set. Therefore, it does make sense to have a train/test split in order to evaluate GANs.

Outras dicas

Training GANs is only a partially unsupervised task, IMHO. It's certainly unsupervised for the Generator, but it's supervised for the Adversarial Network. So it might be useful to test the Disciminator's ability to distinguish fake and true cases on new data it has never seen before.

In other words, it makes sense to split your dataset in train(-validation)-test if you want to understand the Discriminator's ability to generalize its task on data it has never seen before. In case this is not of your interest, I guess you don't need to do it.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange