Convolution backpropagation

https://datascience.stackexchange.com/questions/81232

13-12-2020
|

Question

I'm in the progress to learn, and understand different neural networks. I pretty much understand now feed-forward neural networks, and back-propagation of them, and now learning convolutional neural networks. I understand the forward-propagation of them, but having issues understanding their back-propagation. There is a very good resource explaining the convolutional layer, however, can't understand the back-propagation.

In my understanding, according the back-propagation algorithm of feed-forward neural networks/multi-layer perception, if I have the following input (its items as $i$), and filter (its items as $w$), giving the output (its items as $o$).

$$\begin{pmatrix}i_{1}^1 & i_{2}^1 & i_{3}^1\\\ i_{4}^1 & i_{5}^1 & i_{6}^1\\\ i_{7}^1 & i_{8}^1 & i_{9}^1\end{pmatrix} * \begin{pmatrix}w_1^1 & w_2^1\\\ w_3^1 & w_4^1\end{pmatrix} = \begin{pmatrix}o_1^1 & o_2^1\\\ o_3^1 & o_4^1\end{pmatrix}$$

So if we want to calculate for example how much $w_1^1$ affected the cost $C$, we need to know how much $w_1^1$ affected its corresponding output item $o_1^1$, and how much $o_1^1$ affected the cost $C$ which gives the following equation:

$$\frac{\partial C}{\partial w_1^1} = \frac{\partial o^1}{\partial w_1^1}\frac{\partial C}{\partial o^1}$$

Where in my thinking we have to think back how we get the output regarding to $w_1^1$ to calculate $\frac{\partial o^1}{\partial w_1^1}$.

To get $o_1^1$, we multiplied $w_1^1$ with $i_1^1$, to get $o_2^1$, multiplied $w_1^1$ with $i_2^1$, to get $o_3^1$, multiplied $w_1^1$ with $i_4^1$, to get $o_4^1$, multiplied $w_1^1$ with $i_5^1$.

To calculate $\frac{\partial C}{\partial o^1}$, it depends on how the output is connected with the next layer. If it is an another convolutional layer, then we have to calculate how each output item is connected to the next layers outputs, which will be their connecting weights.

So if we see an example, where we put a 2x2 filter on $o^1$, to get the final output $o^2$ (which will give a single output with 1x1 size):

$$\begin{pmatrix}o_1^1 & o_2^1\\\ o_3^1 & o_4^1\end{pmatrix} * \begin{pmatrix}w_1^2 & w_2^2\\\ w_3^2 & w_4^2\end{pmatrix} = \begin{pmatrix}o_1^2\end{pmatrix}$$

Where in my thinking the back-propagation for $w_1^2$ is:

$$\frac{\partial C}{\partial w_1^2} = \frac{\partial o^2}{\partial w_1^2}\frac{\partial C}{\partial o^2} = o_1^1 * 2(o^2_1 - y_1)$$,

and the back-propagation for $w_1^1$ is:

$$\frac{\partial C}{\partial w_1^1} = \frac{\partial o^1}{\partial w_1^1}\frac{\partial C}{\partial o^1}$$

Where: $$\frac{\partial o^1}{\partial w_1^1} = (i_1^1 + i_2^1 + i_4^1 + i_5^1)$$ And: $$\frac{\partial C}{\partial o^1} = \frac{\partial o_1^2}{\partial o_1^1}\frac{\partial C}{\partial o_1^2} + \frac{\partial o_1^2}{\partial o_2^1}\frac{\partial C}{\partial o_1^2} +\frac{\partial o_1^2}{\partial o_3^1}\frac{\partial C}{\partial o_1^2} +\frac{\partial o_1^2}{\partial o_4^1}\frac{\partial C}{\partial o_1^2}$$ So: $$\frac{\partial C}{\partial o^1} = w_1^2 * 2(o_1^2 - y_1) + w_2^2 * 2(o_1^2 - y_1) + w_3^2 * 2(o_1^2 - y_1) + w_4^2 * 2(o_1^2 - y_1)$$

Am I right? Because as I'm reading through the article above, it seems completely different.

Solution

Note that a CNN is a feed-forward neural network. Thus, if you understand how to perform backpropagation in feed-forward neural networks, you have it for CNNs.

A convolution layer can be understood as a fully connected layer, with the constraints that several edge weights are identical and many edge weights are set to 0.

You can also build a pooling layer in this way. For example an average pooling layer is nothing but a specific convolution layer, with fixed weights.

For max-pooling, use the fact that $\max\{x,y\} = \frac{x+y+|x-y|}{2}$.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange