autoencoders pas d'apprentissage convolutifs

https://datascience.stackexchange.com/questions/15307

16-10-2019
|

Question

Je suis en train de mettre en œuvre autoencoders convolutifs dans tensorflow, sur l'ensemble de données mnist.

Le problème est que le autoencoder ne semble pas apprendre correctement: il apprendra toujours à reproduire la forme 0, mais pas d'autres formes, en fait je reçois habituellement une perte moyenne d'environ 0,09, ce qui est 1/10 du cours qu'il doit apprendre.

J'utilise 2x2 noyaux avec 2 foulées pour l'entrée et sortie convolutions, mais les filtres semble apprendre correctement. Quand je visualise les données, l'image d'entrée est passé à travers 16 (1er conv) et 32 ??filtres (2 conv) et par un contrôle de l'image, il semble fonctionner très bien (à savoir et en apparence comme les courbes, croix, etc sont détectés).

Le problème semble se poser dans la partie entièrement connecté du réseau:. Peu importe ce qui est l'image d'entrée, son encodage sera toujours même

Ma première pensée est « Je suis probablement le nourrir avec des zéros tandis que la formation », mais je ne pense pas avoir fait cette erreur (voir code ci-dessous).

Modifier Je réalise l'ensemble de données n'a pas été brassé, qui introduit un biais et pourrait être la cause du problème. Après son introduction, la perte moyenne est plus faible (0,06 au lieu de 0,09), et en fait l'aspect d'image de sortie comme un flou 8, mais les conclusions sont les mêmes:. L'entrée encodée sera le même, peu importe ce qui est l'image d'entrée

Ici, une entrée d'échantillon à la sortie par rapport

Voici l'activation de l'image ci-dessus, avec les deux couches entièrement connectées au fond (codage est la plus basse).

Enfin, il y a ici l'activation des couches entièrement connectées pour différentes entrées. Chaque image d'entrée correspond à une ligne dans les images d'activation.

Comme vous pouvez le voir, ils donnent toujours la même sortie. Si j'utilise des poids transposés au lieu d'initialiser différents ceux, la première couche FC (image au milieu) ressemble un peu plus aléatoire, mais la tendance sous-jacente est encore évidente. Dans la couche de codage (image en bas), la sortie sera toujours le même, peu importe ce qui est l'entrée (bien sûr, le modèle varie d'une formation et l'autre).

Voici le code correspondant

# A placeholder for the input data
x = tf.placeholder('float', shape=(None, mnist.data.shape[1]))

# conv2d_transpose cannot use -1 in output size so we read the value
# directly in the graph
batch_size = tf.shape(x)[0]

# Variables for weights and biases
with tf.variable_scope('encoding'):
    # After converting the input to a square image, we apply the first convolution, using 2x2 kernels
    with tf.variable_scope('conv1'):
        wec1 = tf.get_variable('w', shape=(2, 2, 1, m_c1), initializer=tf.truncated_normal_initializer())
        bec1 = tf.get_variable('b', shape=(m_c1,), initializer=tf.constant_initializer(0))
    # Second convolution
    with tf.variable_scope('conv2'):
        wec2 = tf.get_variable('w', shape=(2, 2, m_c1, m_c2), initializer=tf.truncated_normal_initializer())
        bec2 = tf.get_variable('b', shape=(m_c2,), initializer=tf.constant_initializer(0))
    # First fully connected layer
    with tf.variable_scope('fc1'):
        wef1 = tf.get_variable('w', shape=(7*7*m_c2, n_h1), initializer=tf.contrib.layers.xavier_initializer())
        bef1 = tf.get_variable('b', shape=(n_h1,), initializer=tf.constant_initializer(0))
    # Second fully connected layer
    with tf.variable_scope('fc2'):
        wef2 = tf.get_variable('w', shape=(n_h1, n_h2), initializer=tf.contrib.layers.xavier_initializer())
        bef2 = tf.get_variable('b', shape=(n_h2,), initializer=tf.constant_initializer(0))

reshaped_x = tf.reshape(x, (-1, 28, 28, 1))
y1 = tf.nn.conv2d(reshaped_x, wec1, strides=(1, 2, 2, 1), padding='VALID')
y2 = tf.nn.sigmoid(y1 + bec1)
y3 = tf.nn.conv2d(y2, wec2, strides=(1, 2, 2, 1), padding='VALID')
y4 = tf.nn.sigmoid(y3 + bec2)
y5 = tf.reshape(y4, (-1, 7*7*m_c2))
y6 = tf.nn.sigmoid(tf.matmul(y5, wef1) + bef1)
encode = tf.nn.sigmoid(tf.matmul(y6, wef2) + bef2)

with tf.variable_scope('decoding'):
    # for the transposed convolutions, we use the same weights defined above
    with tf.variable_scope('fc1'):
        #wdf1 = tf.transpose(wef2)
        wdf1 = tf.get_variable('w', shape=(n_h2, n_h1), initializer=tf.contrib.layers.xavier_initializer())
        bdf1 = tf.get_variable('b', shape=(n_h1,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('fc2'):
        #wdf2 = tf.transpose(wef1)
        wdf2 = tf.get_variable('w', shape=(n_h1, 7*7*m_c2), initializer=tf.contrib.layers.xavier_initializer())
        bdf2 = tf.get_variable('b', shape=(7*7*m_c2,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('deconv1'):
        wdd1 = tf.get_variable('w', shape=(2, 2, m_c1, m_c2), initializer=tf.contrib.layers.xavier_initializer())
        bdd1 = tf.get_variable('b', shape=(m_c1,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('deconv2'):
        wdd2 = tf.get_variable('w', shape=(2, 2, 1, m_c1), initializer=tf.contrib.layers.xavier_initializer())
        bdd2 = tf.get_variable('b', shape=(1,), initializer=tf.constant_initializer(0))

u1 = tf.nn.sigmoid(tf.matmul(encode, wdf1) + bdf1)
u2 = tf.nn.sigmoid(tf.matmul(u1, wdf2) + bdf2)
u3 = tf.reshape(u2, (-1, 7, 7, m_c2))
u4 = tf.nn.conv2d_transpose(u3, wdd1, output_shape=(batch_size, 14, 14, m_c1), strides=(1, 2, 2, 1), padding='VALID')
u5 = tf.nn.sigmoid(u4 + bdd1)
u6 = tf.nn.conv2d_transpose(u5, wdd2, output_shape=(batch_size, 28, 28, 1), strides=(1, 2, 2, 1), padding='VALID')
u7 = tf.nn.sigmoid(u6 + bdd2)
decode = tf.reshape(u7, (-1, 784))

loss = tf.reduce_mean(tf.square(x - decode))
opt = tf.train.AdamOptimizer(0.0001).minimize(loss)

try:
    tf.global_variables_initializer().run()
except AttributeError:
    tf.initialize_all_variables().run() # Deprecated after r0.11

print('Starting training...')
bs = 1000 # Batch size
for i in range(501): # Reasonable results around this epoch 
    # Apply permutation of data at each epoch, should improve convergence time
    train_data = np.random.permutation(mnist.data)
    if i % 100 == 0:
        print('Iteration:', i, 'Loss:', loss.eval(feed_dict={x: train_data}))
    for j in range(0, train_data.shape[0], bs):
        batch = train_data[j*bs:(j+1)*bs]
        sess.run(opt, feed_dict={x: batch})
        # TODO introduce noise
print('Training done')

La solution

Eh bien, le problème est principalement lié à la taille du noyau. En utilisant convolution 2x2 avec foulée de (2,2) tourné pour être une mauvaise idée. En utilisant 5x5 et 3x3 tailles donné des résultats décents.

Licencié sous: CC-BY-SA avec attribution

Non affilié à datascience.stackexchange