How to backpropogate Convolution layer padding inputs with respect to output derivative

https://datascience.stackexchange.com/questions/81529

13-12-2020
|

Question

I created a convolution network with 5 Conv blocks, let discuss the issue based on Conv block 4 & 5

Conv Block 4

Input Image size : 28 * 28, Padding size 1 : 30 * 30 (image size after padding), Filter size : 3 * 3, Convolution output image : 28 * 28, Max pooling output : 14 * 14

Conv Block 5

Input Image size : 14 * 14, Padding size 1 : 16 * 16 (image size after padding), Filter size : 3 * 3, Convolution output image : 14 * 14 , Max pooling output : 7 * 7

Full connected layer

Above output image (7 * 7) has been flattened and passed to Full connected layer and the flattened input has been backpropogated(dL/dO) with respect to loss(gradient checker results are fine).
Now I am trying to backpropogate through the convolution layer as below.

Back Porp - Conv Block 5

Reverse Max pool: 7 * 7 (dL/dO) to 14 * 14 output (dL/dO) Conv 5 Filter derivative 3 * 3 (dL/dF): (14 * 14 output (dL/dO)) * (16 * 16 Conv 5 input image with padding)

I have issue with the below

Conv 5 input derivative:

input size 16 * 16 with padding (dL/dX) = (14 * 14 output (dL/dO)) * (3 * 3 filter) after 180 degree

The question is if I have input derivative of (16 * 16) in conv 5 how can I reverse max pool to Conv block 4 because Conv 4 max output was only (14 * 14).

I think conv 5 input derivative should be 14 * 14 so we can reverse max pool to 28 * 28 and backpropogate the Conv 4 block.

Please help me to resolve this, I am sure I am missing something.

Thanks

Solution

You have derivatives for a 16x16 image including derivatives for padding - but these are worthless. Padding is not a trainable parameter. It is also not a function of previous layers which means computing padding's derivatives is not necessary to do backpropagation for all layers. For these reasons you can just cut out the middle 14x14 matrix and pass it to the max pool layer.

To visualise it even better imagine that you have 1D convolution. Your maxpool outputs 14 values and you pad it to the left and to the right with a 0. It should look like this.

Now notice that during backprop when you have calculated your 16 derivatives to reverse concatenation you simply split your matrix to 1-14-1 and pass 14 variables to maxpool. You can pass the leftover derivatives to padding if for example they're trainable parameters but it usually isn't the case. That's why you don't have to calculate padding derivatives at all.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange