It seems ELU (Exponential Linear Units) is used as an activation function for deep learning. But its' graph is very similar to the graph of $log(1+e^x)$. So why has $log(1+e^x)$ not been used as the activation functions instead of ELU?

In other words what is the advantage of ELU over $log(1+e^x)$?

没有正确的解决方案

许可以下: CC-BY-SA归因
scroll top