Correct number of biases in CNN
-
22-10-2019 - |
Question
What is the correct number of biases in a simple convolutional layer? The question is well enough discussed, but I'm still not quite sure about that.
Say, we have (3, 32, 32)-image and apply a (32, 5, 5)-filter just like in Question about bias in Convolutional Networks
Total number of weights in the layer kernel trivially equals to $3x5x5x32$. Now let us count biases. The link above states that total count of biases is $1x32$, which makes sense because weights are shared among all output cells, so it is natural to have only one bias for each output feature map as a whole.
But from the other side: we apply activation function to each cell of output feature map separately, so if we will have different bias for each cell, they do not sum together, so the number $0x0x32$ instead of $1x32$ makes sense too (here 0 is the output feature map height or width).
As I can see, first approach is widely used, but I also saw the second approach in some papers.
So, ($3x5x5 + 1)x32$ or $(3x5x5 + 0x0)x32$?
Solution
As you say, both approaches are used. It's called tied biases if you use one bias per convolutional filter/kernel ((3x5x5 + 1)x32 overall parameters in your example) and untied biases if you use one bias per kernel and output location ((3x5x5 + OxO)x32 overall parameters in your example).
Untied biases increase the capacity of your model, so they can be a good idea if you are underfitting. But in this case using tied biases and more filters and/or layers might also help, see https://harmdevries89.wordpress.com/2015/03/27/tied-biases-vs-untied-biases/.
OTHER TIPS
When i tried to output my CNN weights from theano's grapgh, i got one bias vector for each layer.