문제

I'm computing cost in the following way:

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y, y_) 
cost = tf.reduce_mean(cross_entropy); 

For the first cost, I am getting 0.693147, which is to be expected on a binary classification when parameters/weights are initialized to 0.

I am using one_hot labels.

However, after completing a training epoch using stochastic gradient descent I am finding a cost of greater than 1.

Is this to be expected?

도움이 되었습니까?

해결책

The following piece of code does essentially what TF's softmax_cross_entropy_with_logits functions does (crossentropy on softmaxed y_ and y):

import scipy as sp
import numpy as np

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

def crossentropy(true, pred):    
    epsilon = 1e-15

    pred = sp.maximum(epsilon, pred)
    pred = sp.minimum(1-epsilon, pred)

    ll = -sum(
        true * sp.log(pred) + \
            sp.subtract(1,true) * \
            sp.log(sp.subtract(1, pred))
    ) / len(true)

    return ll

==

true = [1., 0.]
pred = [5.0, 0.5]

true = softmax(true)
pred = softmax(pred)

print true
print pred

print crossentropy(true, pred)

==

[ 0.73105858  0.26894142]
[ 0.98901306  0.01098694]
1.22128414101

As you can see there is no reason why crossentropy on binary classification cannot be > 1 and it's not hard to come up with such example.

** Crossentropy above is calculated as in https://www.kaggle.com/wiki/LogarithmicLoss, softmax as in https://en.wikipedia.org/wiki/Softmax_function

UPD: there is a great explanation of what it means when logloss is > 1 at SO: https://stackoverflow.com/a/35015188/1166478

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top