Why TensorFlow can't fit simple linear model if I am minimizing absolute mean error instead of the mean squared error?

datascience.stackexchange https://datascience.stackexchange.com/questions/15190

Question

In Introduction I have just changed

loss = tf.reduce_mean(tf.square(y - y_data))

to

loss = tf.reduce_mean(tf.abs(y - y_data)) 

and model is unable to learn the loss just became bigger with time. Why?

Was it helpful?

Solution

I tried this and got same result.

It is because the gradient of .abs is harder for a simple optimiser to follow to the minima, unlike squared difference where gradient approaches zero slowly, the gradient of the absolute difference has a fixed magnitude which abruptly reverses, which tends to make the optimiser oscillate around the minimum point. Basic gradient descent is very sensitive to magnitude of the gradient, and to the learning rate, which is essentially just a multiplier of the gradient for step sizes.

The simplest fix is to reduce the learning rate e.g. change line

optimizer = tf.train.GradientDescentOptimizer(0.5)

to

optimizer = tf.train.GradientDescentOptimizer(0.05)

Also, have a play with different optimisers. Some will be able to cope with .abs-based loss better.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top