Loss function for classifying when more than one output can be 1 at a time

https://datascience.stackexchange.com/questions/15154

16-10-2019
|

Question

My desired output is not 1-hot encoding, but like a 10 D vector: [1, 0, 1, 0, 1, 0, 0, 1, 1, 1] and the input is like the normal MNIST data set.

I want to use TensorFlow to build a model to learn this, then which loss function should I choose?

Solution

If your classes arre not mutually exlcusive, then you just have multiple sigmoid outputs (instead of softmax function as seen in example MNIST classifiers). Each output will be a separate probability that the network assigns to membership in that class.

For a matching loss function, in TensorFlow, you could use the built-in tf.nn.sigmoid_cross_entropy_with_logits - note that it works on the logits - the inputs to the sigmoid function - for efficiency. The link explains the maths involved.

You will still want a sigmoid function on the output layer too, for when you read off the predictions, but you apply the loss function above to the input of the sigmoid function. Note this is not a requirement of your problem, you can easily write a loss function that works from the sigmoid outputs, just the TensorFlow built-in has been written differently to get a small speed boost.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange