Why don't they use all kinds of non-linear functions in Neural Network Activation Functions? [duplicate]

https://cs.stackexchange.com/questions/129401

29-09-2020
|

Вопрос

Pardon my ignorance, but after just learning about Sigmoid and Tanh activation functions (and a few others), I am wondering why they choose functions that always go up and to the right? Why not use all kinds of crazy input functions, those that fluctuate up and down, ones that are directed down instead of up, etc.? What if used functions like those in your neurons, what is the problem, why isn't it done? Why do they stick to very primitive very simple functions?

Решение

I am wondering why they choose functions that always go up and to the right? Why not use [...] ones that are directed down instead of up, etc.?

The activation function is applied to a weighted sum of the inputs of the neuron. Replacing a function that is "directed up" with one that is "directed down" is equivalent to changing the weights. There's no particular reason to prefer an increasing function to a decreasing one, it just happens to be the convention.

Why not use all kinds of crazy input functions, those that fluctuate up and down

In principle, you could. People are using the functions that work best. If somebody found a "crazy" activation function that yielded good results it would certainly get used. However, using a "crazy" activation function probably makes it much harder to train the network. Part of the reason why e.g. ReLU is popular is because its simplicity makes training easier.

Другие советы

Why do they go up? This is called being increasing. It expresses that more stimulus implies more activation. You might ask then, why would they want to express that? No particular reason other than imitating nature.
Why do they go to the right? I don't really understand this question. The $X$-axis, which in Cartesian coordinates is used to represent the inputs of functions with real variable is, by convention, drawn horizontally with the values increasing from left to right.
Why not use all kinds of crazy input functions, those that fluctuate up and down? The more complicated the function, the harder it can be to compute. For a computation that is going to be done many times, one would like it to be as simple as possible. The feature of not being monotonic (going up and down) also doesn't really add much to how expressive is a neural network. You can model a neuron with a non-monotonic activation function as the superposition of a few neurons with monotonic activation functions.$*$
[Why not use] ones that are directed down instead of up? Increasing or decreasing doesn't matter, because this can be expressed by a change of sign of the weights that represent the influence of a neuron on another. Recall that if $f(x)$ is decreasing, then $f(-x)$ and $-f(x)$ are increasing.

$*$ You can observe that this is trading a more complex anatomy of the neural network for a simpler structure of the neuron. Is it worth it? I don't know. Often, when a neural network has some knowledge (usually after being trained to learn something) it is hard to identify in its structure which aspects correspond to which aspects of the knowledge. Potentially, as more research is done techniques could be developed for, in an already trained network, replacing blocks of neurons with a simple activation by a simpler block of neurons with more complex activation. I think that I am starting to talk science fiction with this surgery of neural networks.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с cs.stackexchange