¿Cómo se debe estructurar una red neuronal para la aproximación de la función no unida?

https://datascience.stackexchange.com/questions/9495

16-10-2019
|

Pregunta

He oído que un perceptrón de múltiples capas puede aproximar cualquier función arbitrariamente exacta, dadas suficientes neuronas. Quería probarlo, así que escribí el siguiente código:

#!/usr/bin/env python

"""Example for learning a regression."""


import tensorflow as tf
import numpy


def plot(xs, ys_truth, ys_pred):
    """
    Plot the true values and the predicted values.

    Parameters
    ----------
    xs : list
        Numeric values
    ys_truth : list
        Numeric values, same length as `xs`
    ys_pred : list
        Numeric values, same length as `xs`
    """
    import matplotlib.pyplot as plt
    truth_plot, = plt.plot(xs, ys_truth, '-o', color='#00ff00')
    pred_plot, = plt.plot(xs, ys_pred, '-o', color='#ff0000')
    plt.legend([truth_plot, pred_plot],
               ['Truth', 'Prediction'],
               loc='upper center')
    plt.savefig('plot.png')


# Parameters
learning_rate = 0.1
momentum = 0.6
training_epochs = 1000
display_step = 100

# Generate training data
train_X = []
train_Y = []

# First simple test: a linear function
f = lambda x: x+4

# Second, more complicated test: x^2
# f = lambda x: x**2

for x in range(-20, 20):
    train_X.append(float(x))
    train_Y.append(f(x))
train_X = numpy.asarray(train_X)
train_Y = numpy.asarray(train_Y)
n_samples = train_X.shape[0]

# Graph input
X = tf.placeholder(tf.float32)
reshaped_X = tf.reshape(X, [-1, 1])
Y = tf.placeholder("float")

# Create Model
W1 = tf.Variable(tf.truncated_normal([1, 100], stddev=0.1), name="weight")
b1 = tf.Variable(tf.constant(0.1, shape=[1, 100]), name="bias")
mul = tf.matmul(reshaped_X, W1)
h1 = tf.nn.sigmoid(mul) + b1
W2 = tf.Variable(tf.truncated_normal([100, 100], stddev=0.1), name="weight")
b2 = tf.Variable(tf.constant(0.1, shape=[100]), name="bias")
h2 = tf.nn.sigmoid(tf.matmul(h1, W2)) + b2
W3 = tf.Variable(tf.truncated_normal([100, 1], stddev=0.1), name="weight")
b3 = tf.Variable(tf.constant(0.1, shape=[1]), name="bias")
# identity as activation to get arbitrary output
activation = tf.matmul(h2, W3) + b3

# Minimize the squared errors
l2_loss = tf.reduce_sum(tf.pow(activation-Y, 2))/(2*n_samples)
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(l2_loss)

# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})

        # Display logs per epoch step
        if epoch % display_step == 0:
            cost = sess.run(l2_loss, feed_dict={X: train_X, Y: train_Y})
            print("cost=%s\nW1=%s" % (cost, sess.run(W1)))

    print("Optimization Finished!")
    print("cost=%s W1=%s" %
          (sess.run(l2_loss, feed_dict={X: train_X, Y: train_Y}),
           sess.run(W1)))  # "b2=", sess.run(b2)

    # Get output and plot it
    ys_pred = []
    ys_truth = []

    test_X = []
    for x in range(-40, 40):
        test_X.append(float(x))

    for x in test_X:
        ret = sess.run(activation, feed_dict={X: x})
        ys_pred.append(list(ret)[0][0])
        ys_truth.append(f(x))
    plot(train_X.tolist(), ys_truth, ys_pred)

Este tipo de funciona para funciones lineales (al menos para los datos de capacitación, no tanto para los datos de prueba fuera de la gama):

Sin embargo, no funciona en absoluto para la función no lineal $ x^2 $:

¿Por qué esta red neuronal no funciona para una aproximación de función tan simple? ¿Qué tengo que cambiar para que la misma topología de red funcione para ambas funciones?

Solución

Esto no responde a su pregunta directamente, pero puede incluir algunos consejos útiles:

En lo reciente Aprendizaje residual profundo para el reconocimiento de imágenes papel está escrito:

Si uno plantea la hipótesis de que múltiples capas no lineales pueden aproximar las funciones complicadas asintóticamente [...

Esta hipótesis, sin embargo, sigue siendo una pregunta abierta. Ver [28].

Licenciado bajo: CC-BY-SA con atribución

No afiliado a datascience.stackexchange