Question

I implemented a binary Logistic Regression classifier. Just to play, around I replaced the sigmoid function (1 / 1 + exp(-z)), with tanh. The results were exactly the same, with the same 0.5 threshold for classification and even though tanh is in the range {-1,1} while sigmoid is in the range {0,1}.

Does it really matter that we use the sigmoid function or can any differentiable non-linear function like tanh work?

Thanks.

Was it helpful?

Solution

Did you also change the function in the training, or you just used the same training method and then changed the sigmoid to tanh?

I think what has very likely happened is the following. Have a look at the graphs of sigmoid and tanh:

sigmoid: http://www.wolframalpha.com/input/?i=plot+sigmoid%28x%29+for+x%3D%28-1%2C+1%29 tanh: http://www.wolframalpha.com/input/?i=plot+tanh%28x%29+for+x%3D%28-1%2C+1%29

We can see that in the tanh case, the value y = 0.5 is around x = 0.5. In the sigmoid, the x = 0.5 gets us roughly y = 0.62. Therefore, what I think has probably happened now is that your data doesn't contain any point that would fall within this range, hence you get exactly the same results. Try printing the sigmoid values for your data and see if there is any between 0.5 and 0.62.

The reason behind using the sigmoid function is that it is derived from probability and maximum likelihood. While the other functions may work very similarly, they will lack this probabilistic theory background. For details see for example http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html or http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf

OTHER TIPS

The range of the function should be {0,1} as it represents probability of the outcome.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top