Frage

My question is that what will happen if I arrange different activation functions in the same layer of a neural network and continue the same trend for the other hidden layers.

Suppose I have 3 relu units in the starting and 3 tanh after that and other activation function I the same hidden layer and for the other hidden layers I am scaling all the nodes with the same scale (decreasing/increasing) and the arrangement and order of the activation function is not changing.

War es hilfreich?

Lösung

At a higher level, using multiple activation functions in a NN may not work as these activation functions operate differently for the same inputs. For an input of -1.56, ReLU will a 0, sigmoid will give a 0.174 and Tanh will give -0.91. These differences will not allow the gradients to flow uniformly during backpropagation.

You may try using two different activation functions in the same layers. I can think of some issues which may arise,

  • Talking about the combination of Tanh and ReLU functions, we can face the problem of exploding gradients. ReLU functions provide the same inputs as outputs if they're zero or positive. On the other hand, Tanh function provides outputs in the range [ -1, 1 ]. Large positive values will pass through the ReLU function unchanged but while passing through the Tanh function, you'll always get a fully saturated firing i.e an output of 1 always.

  • Therefore, while using ReLU in a classification problem, we use softmax instead of the sigmoid function. If the sigmoid function is used, you'll probably get an output of ones only.

Instead of the ReLU-Tanh combination, you may want to use the Sigmoid-Tanh combination.

  • Here, the sigmoid may give an output of 0 ( or a number close to zero ), if given a negative value. These negative values may be supplied by the Tanh function in the preceding layers. You may face overfitting here.
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit datascience.stackexchange
scroll top