Domanda

In a neural network code written in PyTorch, we have defined and used this custom loss, that should replicate the behavior of the Cross Entropy loss:

def my_loss(output, target):
    global classes    

    v = torch.empty(batchSize)
    xi = torch.empty(batchSize)

    for j in range(0, batchSize):
        v[j] = 0
        for k in range(0, len(classes)):
            v[j] += math.exp(output[j][k]) 

    for j in range(0, batchSize):
        xi[j] = -math.log( math.exp( output[j][target[j]] ) / v[j] )

    loss = torch.mean(xi)
    print(loss)
    loss.requires_grad = True
    return loss

but it doesn't converge to accetable accuracies.

È stato utile?

Soluzione

You should only use pytorch's implementation of math functions, otherwise, torch does not know how to differentiate them. Replace math.exp with torch.exp, math.log with torch.log.

Also, try to use vectorised operations instead of loops as often as you can, because this will be much faster.

Finally, as far as I can see, you are merely reimplementing a log loss in pytorch, any reason why you don't use one that has been implemented by default? (see here or here)

[EDIT]: If after having removed math operations and implemented a vectorised version of the loss, it still does not converge, here are a few pointers on how to debug it:

  • Check that the loss is correct by calculating the value manually and compare it with what the function outputs
  • Compute the gradient manually and check that it is the same as the values in loss.grad, after running loss.backward() (more info here)
  • Monitor the loss and the gradient after a few iterations to check that everything goes right during the training
Autorizzato sotto: CC-BY-SA insieme a attribuzione
scroll top