
One is: $$J=-\frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}\Big[y_{k}^{i}\log\big((h_{\theta}(x^{i}))_k\big)+(1-y_{k}^{i})\log\big(1-(h_{\theta}(x^{i}))_k\big)\Big]$$

The other one is: $$J=-\frac{1}{m}\sum_{i=1}^{m}\Big[y^{i}\log(a^{i})+(1-y^{i})\log(1-a^{i})\Big]$$

As I can see those two equations are not equal. How both can be used to calculate cost function?

Also, one of them using $h$ function which is $a$ of output layer, whereas others are using $a$ ($a$ is $f(w*x)$ where $f$ is activation function). When I looked from the book "Pattern Recognition and Machine Learning" from Bishop, he used $a$ for both of the equations. But from another source which I took equations from $h$ is used. But using different $a$ values and using just one of them (namely $h$ which is $a$ of output) are totally different things.

Both sources are reliable, what am I missing?

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
scroll top