What is the gradient descent rule using binary cross entropy (BCE) with tanh?
-
10-12-2020 - |
Question
Similar to this post, I need the gradient descent step of tanh but now with binary cross entropy (BCE).
So we have
$$ \Delta \omega = -\eta \frac{\delta E}{\delta \omega} $$
Now we have BCE:
$$ E = −(ylog(\hat{y})+(1−y)log(1−\hat{y})) $$
Considering my output is $\hat{y} = tanh(\omega .x)$, $x$ is my input vector and $y_i$ is the corresponding label here. $$ \frac{\delta E}{\delta \omega} = \frac{\delta −(ylog(tanh(wx))+(1−y)log(1−tanh(wx)))}{\delta \omega} $$
Now on this website they do something similar for the normal sigmoid and arrive at (eq 60):
$$ \frac{σ′(z)x}{ σ(z)(1−σ(z))}(σ(z)−y) $$
Could we use that and continue there? We can get the derivative like this and get:
$$ \frac{tanh′(wx)x}{tanh(wx)(1−tanh(wx))}(tanh(wx)−y) \\= \frac{x-xtanh(wx)^2}{tanh(wx)(1−tanh(wx))}(tanh(wx)−y) \\= \frac{x-x\hat{y}^2}{\hat{y}(1−\hat{y})}(\hat{y}−y) \\= \frac{(\hat{y} + 1)x(\hat{y} - y)}{\hat{y}} $$
Wherever I look, I don't find this :)
Update
Given the first answer that gives $(1 + \hat{y})(1 - \hat{y})$, we arrive at the same
$$ \frac{tanh′(wx)x}{tanh(wx)(1−tanh(wx))}(tanh(wx)−y) \\= \frac{x(1 + \hat{y})(1 - \hat{y})}{\hat{y}(1−\hat{y})}(\hat{y}−y) \\= \frac{(\hat{y} + 1)x(\hat{y} - y)}{\hat{y}} $$
La solution
Let 'a' be the output from an activation function like sigmoid or tanh.
Therefore, the derivative of sigmoid is a*(1-a) whereas the derivative for tanh is (1+a)*(1-a).
Just follow the derivation of sigmoid except replace the derivative of sigmoid with that of tanh.