Implement the following loss function without interrupting the gradient chain registered by the gradient tape

https://datascience.stackexchange.com/questions/75758

12-12-2020
|

Question

I have spent five days trying to implement the following algorithm as a loss function to use it in my neural network, but it has been impossible for me. Impossible because, when I have finally implemented, I get the error:

No gradients provided for any variable: ['conv1_1/kernel:0', 'conv1_1/bias:0', 'conv1_2/kernel:0'

I'm implementing a semantic segmentation network to identify brain tumours. The network is always returning very good accuracy, about 94.5%, but when I have plotted the real mask with network outputs, I see that the accuracy is 0% because it doesn't plot any white dot:

By the way, the mask image only has values between 0.0, black, and 1.0, white, values.

So, I have decided to implement my own loss function, which has to do the following:

In a nutshell, sum mask image to output and count how many 2.0 values are in this sum. Compare this count of 2.0 values with the count of 1.0 values in the mask image, and get the error.

More detailed:

Converts the output's value of the model into 0.0 or 1.0.
Sum that output to the mask (which values are also 0.0 or 1.0).
Counts how many 2.0 are this sum.
Counts how many 1.0 are in the mask.
Returns the different between them.

My question is:

Is there a already tensorflow function that do that?

Now, I get that network output because I'm using Euclidean Distance. I have implemented the loss function using tf.norm:

def loss(model, x, y):
  global output
  output = model(x)
  return tf.norm(y - output)

And then, I use this loss function to tape gradients:

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
      loss_value = loss(model, inputs, targets)      
  return loss_value, tape.gradient(loss_value, model.trainable_variables)

Maybe, what I am trying to do is another kind of distance.

Solution

Your intuition about counting the 2 is pretty good. For this type of problem, you can use the dice loss function wich use your idea but sligthly differently.

the values of predicted and ground truth components (each pixels) are either 0 or 1, representing whether the pixel bayond to the label (value of 1) or not (value of 0). Therefore, the denominator is the sum of total labeled pixels of both prediction and ground truth, and the numerator is the sum of correctly predicted boundary pixels because the sum increments only when predicted and ground truth match (both of value 1).

as cited here:

For example, if two sets A and B overlap perfectly, DSC gets its maximum value to 1. Otherwise, DSC starts to decrease, getting to its minimum value to 0 if the two sets don ‘t overlap at all. Therefore, the range of DSC is between 0 and 1, the larger the better. Thus we can use 1-DSC as Dice loss to maximize the overlap between two sets.

The dice loss is commonly used in segmentation task because it is resilien to class inbalance (eg too much background)

you can implement it in tensorflow keras like this;

def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2.0 * intersection + 1.0) / (K.sum(y_true_f) + K.sum(y_pred_f) + 1.0)

def dice_coef_loss(y_true, y_pred):
    return 1-dice_coef(y_true, y_pred)

(in the implementation, we use a smoothing termes (here it's 1, the lower the value is the better) The smoothing therme is here to avoid dividing by 0 if the input as no label in it)

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange