Gradient descent does not converge in some runs and converges in other runs in the following simple Keras network

https://datascience.stackexchange.com/questions/85507

16-12-2020
|

Question

When training a simple Keras NN (1 input, 1 level with 1 unit for a regression task) during some runs I get big constant loss that does not change in 80 batches. During other runs it decreases. What may be the reason that gradient does not converge in some runs and converges in other runs in the following network: ?

import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras import layers

# Generate data

start, stop = 1,100
cnt = stop - start + 1
xs = np.linspace(start, stop, num = cnt)
b,k = 1,2
ys = np.array([k*x + b for x in xs])

# Simple model with one feature and one unit for regression task

model = keras.Sequential([
    layers.Dense(units=1, input_shape=[1], activation='relu')
])
model.compile(loss='mae', optimizer='adam')
batch_size = int(cnt / 5)
epochs = 80

Next goes callback to save the Keras model weights at some frequency. According to Keras docs:

save_freq: 'epoch' or integer. When using 'epoch', the callback should save the model after each epoch. When using integer, the callback should save the model at end of this many batches.

weights_dict = {}
weight_callback = tf.keras.callbacks.LambdaCallback \
( on_epoch_end=lambda epoch, logs:  weights_dict.update({epoch:model.get_weights()}))

Train model:

history = model.fit(xs, ys, batch_size=batch_size, epochs=epochs, callbacks=[weight_callback])

I get:

Epoch 1/80
5/5 [==============================] - 0s 770us/step - loss: 102.0000
Epoch 2/80
5/5 [==============================] - 0s 802us/step - loss: 102.0000
Epoch 3/80
5/5 [==============================] - 0s 750us/step - loss: 102.0000
Epoch 4/80
5/5 [==============================] - 0s 789us/step - loss: 102.0000
Epoch 5/80
5/5 [==============================] - 0s 745us/step - loss: 102.0000
Epoch 6/80
...
...
...
Epoch 78/80
5/5 [==============================] - 0s 902us/step - loss: 102.0000
Epoch 79/80
5/5 [==============================] - 0s 755us/step - loss: 102.0000
Epoch 80/80
5/5 [==============================] - 0s 1ms/step - loss: 102.0000

Weights:

for epoch, weights in weights_dict.items():
    print("*** Epoch: ", epoch, "\nWeights: ", weights)

Output:

*** Epoch:  0 
Weights:  [array([[-0.44768167]], dtype=float32), array([0.], dtype=float32)]
*** Epoch:  1 
Weights:  [array([[-0.44768167]], dtype=float32), array([0.], dtype=float32)]
*** Epoch:  2 
Weights:  [array([[-0.44768167]], dtype=float32), array([0.], dtype=float32)]
*** Epoch:  3 
Weights:  [array([[-0.44768167]], dtype=float32), array([0.], dtype=float32)]
...
...

As you can see, weights and biases also do not change, bias = 0.

Yet on other runs gradient descent converges, weights and non-zero biases are fitted with much smaller loss. The problem is repeatable. The problem is that it converges in 30% of runs with exactly the same set of parameters that it does not in 70% of runs. Why it does some times and some times does not with the same data and parameters?

Solution

Their is some random elements when using packages such as TensorFlow, Numpy etc. Some examples includes:

How the weights are initialized.
How the data is shuffled (if enabled) in each batch. Batches containing different data, will produce different gradients which might influence convergence.

This means, that even when you run the same code it is actually not 100% the same and that is why you get different results.

If you want the same results, you should fix the random seed as follow: tf.random.set_seed(1234). This is usually done after the imports. The value 1234 can be any integer, for example if I use a value of 500 I get the same results and good convergence.

Some other points to note

If I remember correctly calculations perform using a GPU might also introduce random factors.
It is a good idea to also fix Numpy seed, random package seed and any function which takes a seed value e.g. sklearn.model_selection.train_test_split

OTHER TIPS

Getting reproducible results with Keras is not straightforward, see https://machinelearningmastery.com/reproducible-results-neural-networks-keras/, since TensorFlow, Numpy and the working environment itself can introduce different seeds affecting different parts of the training. In order to get reproducible training processes, try setting the following random seeds and initialize your layers as:

seed_value=0  

os.environ['PYTHONHASHSEED']=str(seed_value)     
np.random.seed(seed_value)     
tf.random.set_seed(seed_value)     

my_init = keras.initializers.glorot_uniform(seed=seed_value)

If the NN is struggling to converge it may be because the optimizer is not considering the best choice of parameters, e.g. a too large learning_rate could jeopardize the convergence of the method.

Try setting a lower learning_rate value, which may make convergence slower but more robust.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange