卷积自动编码器不学习

https://datascience.stackexchange.com/questions/15307

16-10-2019
|

题

我正在尝试在MNIST数据集中在TensorFlow中实现卷积自动编码器。

问题是自动编码器似乎无法正确学习：它将始终学会重现0个形状，但没有其他形状，实际上我通常会平均损失约0.09，这是其中的1/10个类别应该学习。

我使用的是2x2核，用于输入和输出卷积，但是过滤器似乎是正确学习的。当我可视化数据时，输入图像将通过16（第1个转换）和32个过滤器（第二个转换）传递，并且通过图像检查似乎运行良好（即显然检测到曲线，十字架等特征）。

问题似乎在网络的完全连接部分中出现：无论输入图像是什么，其编码都将是总是相同。

我的第一个想法是“我可能只是在训练时用零来喂食”，但我认为我没有犯此错误（请参见下面的代码）。

编辑我意识到数据集没有被改组，这引入了偏见，可能是问题的原因。引入它后，平均损失较低（0.06而不是0.09），实际上，输出图像看起来像一个模糊8，但结论是相同的：无论输入图像是什么，编码的输入都将是相同的。

这里有相对输出的样本输入

这是上面图像的激活，底部有两个完全连接的层（编码是Bottommost）。

最后，这里有针对不同输入的完全连接层的激活。每个输入图像对应于激活图像中的一条线。

如您所见，它们总是产生相同的输出。如果我使用转置重量而不是初始化不同的权重，则第一个FC层（中间的图像）看起来更随机，但是基础模式仍然很明显。在编码层（底部的图像）中，无论输入是什么（当然，模式与一个训练和下一个训练）都将始终相同。

这是相关代码

# A placeholder for the input data
x = tf.placeholder('float', shape=(None, mnist.data.shape[1]))

# conv2d_transpose cannot use -1 in output size so we read the value
# directly in the graph
batch_size = tf.shape(x)[0]

# Variables for weights and biases
with tf.variable_scope('encoding'):
    # After converting the input to a square image, we apply the first convolution, using 2x2 kernels
    with tf.variable_scope('conv1'):
        wec1 = tf.get_variable('w', shape=(2, 2, 1, m_c1), initializer=tf.truncated_normal_initializer())
        bec1 = tf.get_variable('b', shape=(m_c1,), initializer=tf.constant_initializer(0))
    # Second convolution
    with tf.variable_scope('conv2'):
        wec2 = tf.get_variable('w', shape=(2, 2, m_c1, m_c2), initializer=tf.truncated_normal_initializer())
        bec2 = tf.get_variable('b', shape=(m_c2,), initializer=tf.constant_initializer(0))
    # First fully connected layer
    with tf.variable_scope('fc1'):
        wef1 = tf.get_variable('w', shape=(7*7*m_c2, n_h1), initializer=tf.contrib.layers.xavier_initializer())
        bef1 = tf.get_variable('b', shape=(n_h1,), initializer=tf.constant_initializer(0))
    # Second fully connected layer
    with tf.variable_scope('fc2'):
        wef2 = tf.get_variable('w', shape=(n_h1, n_h2), initializer=tf.contrib.layers.xavier_initializer())
        bef2 = tf.get_variable('b', shape=(n_h2,), initializer=tf.constant_initializer(0))

reshaped_x = tf.reshape(x, (-1, 28, 28, 1))
y1 = tf.nn.conv2d(reshaped_x, wec1, strides=(1, 2, 2, 1), padding='VALID')
y2 = tf.nn.sigmoid(y1 + bec1)
y3 = tf.nn.conv2d(y2, wec2, strides=(1, 2, 2, 1), padding='VALID')
y4 = tf.nn.sigmoid(y3 + bec2)
y5 = tf.reshape(y4, (-1, 7*7*m_c2))
y6 = tf.nn.sigmoid(tf.matmul(y5, wef1) + bef1)
encode = tf.nn.sigmoid(tf.matmul(y6, wef2) + bef2)

with tf.variable_scope('decoding'):
    # for the transposed convolutions, we use the same weights defined above
    with tf.variable_scope('fc1'):
        #wdf1 = tf.transpose(wef2)
        wdf1 = tf.get_variable('w', shape=(n_h2, n_h1), initializer=tf.contrib.layers.xavier_initializer())
        bdf1 = tf.get_variable('b', shape=(n_h1,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('fc2'):
        #wdf2 = tf.transpose(wef1)
        wdf2 = tf.get_variable('w', shape=(n_h1, 7*7*m_c2), initializer=tf.contrib.layers.xavier_initializer())
        bdf2 = tf.get_variable('b', shape=(7*7*m_c2,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('deconv1'):
        wdd1 = tf.get_variable('w', shape=(2, 2, m_c1, m_c2), initializer=tf.contrib.layers.xavier_initializer())
        bdd1 = tf.get_variable('b', shape=(m_c1,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('deconv2'):
        wdd2 = tf.get_variable('w', shape=(2, 2, 1, m_c1), initializer=tf.contrib.layers.xavier_initializer())
        bdd2 = tf.get_variable('b', shape=(1,), initializer=tf.constant_initializer(0))

u1 = tf.nn.sigmoid(tf.matmul(encode, wdf1) + bdf1)
u2 = tf.nn.sigmoid(tf.matmul(u1, wdf2) + bdf2)
u3 = tf.reshape(u2, (-1, 7, 7, m_c2))
u4 = tf.nn.conv2d_transpose(u3, wdd1, output_shape=(batch_size, 14, 14, m_c1), strides=(1, 2, 2, 1), padding='VALID')
u5 = tf.nn.sigmoid(u4 + bdd1)
u6 = tf.nn.conv2d_transpose(u5, wdd2, output_shape=(batch_size, 28, 28, 1), strides=(1, 2, 2, 1), padding='VALID')
u7 = tf.nn.sigmoid(u6 + bdd2)
decode = tf.reshape(u7, (-1, 784))

loss = tf.reduce_mean(tf.square(x - decode))
opt = tf.train.AdamOptimizer(0.0001).minimize(loss)

try:
    tf.global_variables_initializer().run()
except AttributeError:
    tf.initialize_all_variables().run() # Deprecated after r0.11

print('Starting training...')
bs = 1000 # Batch size
for i in range(501): # Reasonable results around this epoch 
    # Apply permutation of data at each epoch, should improve convergence time
    train_data = np.random.permutation(mnist.data)
    if i % 100 == 0:
        print('Iteration:', i, 'Loss:', loss.eval(feed_dict={x: train_data}))
    for j in range(0, train_data.shape[0], bs):
        batch = train_data[j*bs:(j+1)*bs]
        sess.run(opt, feed_dict={x: batch})
        # TODO introduce noise
print('Training done')

解决方案

好吧，问题主要与内核大小有关。使用2x2卷积（2,2）的卷积变成了一个坏主意。使用5x5和3x3尺寸产生了不错的结果。

许可以下： CC-BY-SA 和归因

不隶属于 datascience.stackexchange