Loss function for ReLu, ELU, SELU

https://datascience.stackexchange.com/questions/86308

17-12-2020
|

Pergunta

Question

What is the loss function for each different activation function?

Background

The choice of the loss function of a neural network depends on the activation function. For sigmoid activation, cross entropy log loss results in simple gradient form for weight update z(z - label) * x where z is the output of the neuron.

This simplicity with the log loss is possible because the derivative of sigmoid make it possible, in my understanding. The activation function other than sigmoid which does not have this nature of sigmoid would not be a good combination with the log loss. Then what are the loss function for ReLu, ELU, SELU?

References

Derivative of Binary Cross Entropy - why are my signs not right?

Solução

The question requires some preliminary clarification IMHO. The choice of activation and loss function both depend on your task, on the kind of problem you want to solve. Here are some examples:

If you are training a binary classifier you can solve the problem with sigmoid activation + binary crossentropy loss.
If you are training a multi-class classifier with multiple classes, then you need softmax activation + crossentropy loss.
If you are training a regressor you need a proper activation function with MSE or MAE loss, usually. With "proper" I mean linear, in case your output is unbounded, or ReLU in case your output takes only positive values. These are countless examples.

With activation function here I refer to the activation of the output layer. The activation that performs the final prediction scores doesn't have to be (and usually isn't) the same as the one used in hidden layers.

ELU and SELU are typically used for the hidden layers of a Neural Network, I personally never heard of an application of ELU or SELU for final outputs.

Both choices of final activation and loss function depend on the task, this is the only criterion to follow to implement a good Neural Network.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange