Perché tensorflow non può andare bene semplice modello lineare se sto minimizzando errore medio assoluto invece che l'errore quadratico medio?

https://datascience.stackexchange.com/questions/15190

16-10-2019
|

Domanda

Introduzione ho appena cambiato

loss = tf.reduce_mean(tf.square(y - y_data))

loss = tf.reduce_mean(tf.abs(y - y_data))

e il modello è in grado di imparare la perdita di appena diventato più grande con il tempo. Perché?

Soluzione

I tried this and got same result.

It is because the gradient of .abs is harder for a simple optimiser to follow to the minima, unlike squared difference where gradient approaches zero slowly, the gradient of the absolute difference has a fixed magnitude which abruptly reverses, which tends to make the optimiser oscillate around the minimum point. Basic gradient descent is very sensitive to magnitude of the gradient, and to the learning rate, which is essentially just a multiplier of the gradient for step sizes.

The simplest fix is to reduce the learning rate e.g. change line

optimizer = tf.train.GradientDescentOptimizer(0.5)

optimizer = tf.train.GradientDescentOptimizer(0.05)

Also, have a play with different optimisers. Some will be able to cope with .abs-based loss better.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a datascience.stackexchange