Lower loss always better for Probabilistic loss functions?

https://datascience.stackexchange.com/questions/77014

12-12-2020
|

Pregunta

I am working on an neural net int Tensorflow that predicts percentages for win, draw, loss for given data of a game. The labels I provide are always {1, 0, 0}, {0, 1, 0} or {0, 0, 1}. After some epochs my accuracy doesn't increase any further, but the loss still decreases for a many epochs (also on the validation set, though very slowly). I am using a softmax activation in the last layer and the categorical crossentropy loss function provided by Keras. I was wondering if in this case, lower loss always corresponds to better probabilities (because I obviously wouldn't want the net to output only values like 1 or 0 for probabilities), or in other words, does this net output the "true" probabilites and if so, why does it do that?

Solución

If $0.5$ is the threshold for declaring a class (perhaps more sensible in a binary classification than your problem, yes), there is no incentive for accuracy to regard a $1$ as a $0.95$ instead of a $0.51$.

Meanwhile your cross-entropy loss function sees that the correct answer is $1$ and wants to get the probability as close to $1$ as it can. Accuracy, however, doesn't care if the predicted probability is $0.51$ or $0.95$, so accuracy does not change as you move the predicted probability closer and closer to the observed value, even though the loss function decreases by getting closer and closer to the observation (as you would expect loss to do...consider how square loss behaves in a linear regression).

Otros consejos

Consider loss to be something akin to implied variability of your model. When loss is extremely high, your model's output could be just about anything. As the loss lowers, your model is becoming more confident in its output and will be able to give similar output classification/regressions even if the initial weights or input data is slightly different. Lower loss in the training set is always a good thing so long as you are seeing a subsequent decrease in the loss of the validation set. As soon as that validation loss starts to stagnate though, I like to save the model and quit. Usually at this point you could get a little better loss, but it tends to just be overfitting and doesn't help any on the test set.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a datascience.stackexchange