学習していない畳み込み自動エンコーダー

https://datascience.stackexchange.com/questions/15307

16-10-2019
|

質問

MNISTデータセットにTensorflowで畳み込み自動エンコーダーを実装しようとしています。

問題は、自動エンコーダーが適切に学習していないように見えることです。常に0の形状を再現することを学びますが、他の形状はありません。実際、私は通常、約0.09の平均損失を得ます。これはクラスの1/10です。学ぶべきです。

入力および出力の畳み込みには、Stride 2の2x2カーネルを使用していますが、フィルターは適切に学習されているようです。データを視覚化すると、入力画像は16（1st Conv）と32フィルター（2番目のCONV）に渡され、画像検査では正常に実行されているようです（つまり、Curves、Crossesなどの機能が検出されます）。

問題はネットワークの完全に接続された部分で発生しているようです。入力画像が何であれ、そのエンコードは いつも 同じ。

私の最初の考えは、「おそらくトレーニング中にゼロを食べているだけだ」ということですが、この間違いを犯したとは思いません（以下のコードを参照）。

編集データセットがシャッフルされておらず、バイアスが導入され、問題の原因になる可能性があることに気付きました。導入した後、平均損失は低く（0.09ではなく0.06）、実際には出力画像はぼやけ8のように見えますが、結論は同じです。エンコードされた入力は、入力画像に関係なく同じです。

ここでは、相対出力を備えたサンプル入力

上の画像のアクティベーションは次のとおりです。2つの完全に接続されたレイヤーが下部にあります（エンコードはBottommostです）。

最後に、ここでは、さまざまな入力の完全に接続されたレイヤーのアクティベーションがあります。各入力画像は、アクティベーション画像の行に対応します。

ご覧のとおり、それらは常に同じ出力を生成します。さまざまな重みを初期化する代わりに転置した重みを使用すると、最初のFC層（中央の画像）がもう少しランダム化されているように見えますが、根底にあるパターンはまだ明らかです。エンコードレイヤー（下部の画像）では、出力は入力が何であれ、常に同じになります（もちろん、パターンは1つのトレーニングと次のトレーニングから異なります）。

関連するコードは次のとおりです

# A placeholder for the input data
x = tf.placeholder('float', shape=(None, mnist.data.shape[1]))

# conv2d_transpose cannot use -1 in output size so we read the value
# directly in the graph
batch_size = tf.shape(x)[0]

# Variables for weights and biases
with tf.variable_scope('encoding'):
    # After converting the input to a square image, we apply the first convolution, using 2x2 kernels
    with tf.variable_scope('conv1'):
        wec1 = tf.get_variable('w', shape=(2, 2, 1, m_c1), initializer=tf.truncated_normal_initializer())
        bec1 = tf.get_variable('b', shape=(m_c1,), initializer=tf.constant_initializer(0))
    # Second convolution
    with tf.variable_scope('conv2'):
        wec2 = tf.get_variable('w', shape=(2, 2, m_c1, m_c2), initializer=tf.truncated_normal_initializer())
        bec2 = tf.get_variable('b', shape=(m_c2,), initializer=tf.constant_initializer(0))
    # First fully connected layer
    with tf.variable_scope('fc1'):
        wef1 = tf.get_variable('w', shape=(7*7*m_c2, n_h1), initializer=tf.contrib.layers.xavier_initializer())
        bef1 = tf.get_variable('b', shape=(n_h1,), initializer=tf.constant_initializer(0))
    # Second fully connected layer
    with tf.variable_scope('fc2'):
        wef2 = tf.get_variable('w', shape=(n_h1, n_h2), initializer=tf.contrib.layers.xavier_initializer())
        bef2 = tf.get_variable('b', shape=(n_h2,), initializer=tf.constant_initializer(0))

reshaped_x = tf.reshape(x, (-1, 28, 28, 1))
y1 = tf.nn.conv2d(reshaped_x, wec1, strides=(1, 2, 2, 1), padding='VALID')
y2 = tf.nn.sigmoid(y1 + bec1)
y3 = tf.nn.conv2d(y2, wec2, strides=(1, 2, 2, 1), padding='VALID')
y4 = tf.nn.sigmoid(y3 + bec2)
y5 = tf.reshape(y4, (-1, 7*7*m_c2))
y6 = tf.nn.sigmoid(tf.matmul(y5, wef1) + bef1)
encode = tf.nn.sigmoid(tf.matmul(y6, wef2) + bef2)

with tf.variable_scope('decoding'):
    # for the transposed convolutions, we use the same weights defined above
    with tf.variable_scope('fc1'):
        #wdf1 = tf.transpose(wef2)
        wdf1 = tf.get_variable('w', shape=(n_h2, n_h1), initializer=tf.contrib.layers.xavier_initializer())
        bdf1 = tf.get_variable('b', shape=(n_h1,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('fc2'):
        #wdf2 = tf.transpose(wef1)
        wdf2 = tf.get_variable('w', shape=(n_h1, 7*7*m_c2), initializer=tf.contrib.layers.xavier_initializer())
        bdf2 = tf.get_variable('b', shape=(7*7*m_c2,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('deconv1'):
        wdd1 = tf.get_variable('w', shape=(2, 2, m_c1, m_c2), initializer=tf.contrib.layers.xavier_initializer())
        bdd1 = tf.get_variable('b', shape=(m_c1,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('deconv2'):
        wdd2 = tf.get_variable('w', shape=(2, 2, 1, m_c1), initializer=tf.contrib.layers.xavier_initializer())
        bdd2 = tf.get_variable('b', shape=(1,), initializer=tf.constant_initializer(0))

u1 = tf.nn.sigmoid(tf.matmul(encode, wdf1) + bdf1)
u2 = tf.nn.sigmoid(tf.matmul(u1, wdf2) + bdf2)
u3 = tf.reshape(u2, (-1, 7, 7, m_c2))
u4 = tf.nn.conv2d_transpose(u3, wdd1, output_shape=(batch_size, 14, 14, m_c1), strides=(1, 2, 2, 1), padding='VALID')
u5 = tf.nn.sigmoid(u4 + bdd1)
u6 = tf.nn.conv2d_transpose(u5, wdd2, output_shape=(batch_size, 28, 28, 1), strides=(1, 2, 2, 1), padding='VALID')
u7 = tf.nn.sigmoid(u6 + bdd2)
decode = tf.reshape(u7, (-1, 784))

loss = tf.reduce_mean(tf.square(x - decode))
opt = tf.train.AdamOptimizer(0.0001).minimize(loss)

try:
    tf.global_variables_initializer().run()
except AttributeError:
    tf.initialize_all_variables().run() # Deprecated after r0.11

print('Starting training...')
bs = 1000 # Batch size
for i in range(501): # Reasonable results around this epoch 
    # Apply permutation of data at each epoch, should improve convergence time
    train_data = np.random.permutation(mnist.data)
    if i % 100 == 0:
        print('Iteration:', i, 'Loss:', loss.eval(feed_dict={x: train_data}))
    for j in range(0, train_data.shape[0], bs):
        batch = train_data[j*bs:(j+1)*bs]
        sess.run(opt, feed_dict={x: batch})
        # TODO introduce noise
print('Training done')

解決

まあ、問題は主にカーネルのサイズに関連していた。（2,2）のストライドで2x2の畳み込みを使用することは悪い考えになりました。 5x5および3x3サイズを使用すると、まともな結果が得られました。

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange