What is the gradient descent rule using binary cross entropy (BCE) with tanh?

https://datascience.stackexchange.com/questions/73275

10-12-2020
|

Вопрос

Similar to this post, I need the gradient descent step of tanh but now with binary cross entropy (BCE).

So we have

$$ \Delta \omega = -\eta \frac{\delta E}{\delta \omega} $$

Now we have BCE:

$$ E = −(ylog(\hat{y})+(1−y)log(1−\hat{y})) $$

Considering my output is $\hat{y} = tanh(\omega .x)$, $x$ is my input vector and $y_i$ is the corresponding label here. $$ \frac{\delta E}{\delta \omega} = \frac{\delta −(ylog(tanh(wx))+(1−y)log(1−tanh(wx)))}{\delta \omega} $$

Now on this website they do something similar for the normal sigmoid and arrive at (eq 60):

$$ \frac{σ′(z)x}{ σ(z)(1−σ(z))}(σ(z)−y) $$

Could we use that and continue there? We can get the derivative like this and get:

$$ \frac{tanh′(wx)x}{tanh(wx)(1−tanh(wx))}(tanh(wx)−y) \\= \frac{x-xtanh(wx)^2}{tanh(wx)(1−tanh(wx))}(tanh(wx)−y) \\= \frac{x-x\hat{y}^2}{\hat{y}(1−\hat{y})}(\hat{y}−y) \\= \frac{(\hat{y} + 1)x(\hat{y} - y)}{\hat{y}} $$

Wherever I look, I don't find this :)

Update

Given the first answer that gives $(1 + \hat{y})(1 - \hat{y})$, we arrive at the same

$$ \frac{tanh′(wx)x}{tanh(wx)(1−tanh(wx))}(tanh(wx)−y) \\= \frac{x(1 + \hat{y})(1 - \hat{y})}{\hat{y}(1−\hat{y})}(\hat{y}−y) \\= \frac{(\hat{y} + 1)x(\hat{y} - y)}{\hat{y}} $$

Решение

Let 'a' be the output from an activation function like sigmoid or tanh.

Therefore, the derivative of sigmoid is a*(1-a) whereas the derivative for tanh is (1+a)*(1-a).

Just follow the derivation of sigmoid except replace the derivative of sigmoid with that of tanh.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с datascience.stackexchange