Question

What is the correct number of biases in a simple convolutional layer? The question is well enough discussed, but I'm still not quite sure about that.

Say, we have (3, 32, 32)-image and apply a (32, 5, 5)-filter just like in Question about bias in Convolutional Networks

Total number of weights in the layer kernel trivially equals to $3x5x5x32$. Now let us count biases. The link above states that total count of biases is $1x32$, which makes sense because weights are shared among all output cells, so it is natural to have only one bias for each output feature map as a whole.

But from the other side: we apply activation function to each cell of output feature map separately, so if we will have different bias for each cell, they do not sum together, so the number $0x0x32$ instead of $1x32$ makes sense too (here 0 is the output feature map height or width).

As I can see, first approach is widely used, but I also saw the second approach in some papers.

So, ($3x5x5 + 1)x32$ or $(3x5x5 + 0x0)x32$?

Was it helpful?

Solution

As you say, both approaches are used. It's called tied biases if you use one bias per convolutional filter/kernel ((3x5x5 + 1)x32 overall parameters in your example) and untied biases if you use one bias per kernel and output location ((3x5x5 + OxO)x32 overall parameters in your example).

Untied biases increase the capacity of your model, so they can be a good idea if you are underfitting. But in this case using tied biases and more filters and/or layers might also help, see https://harmdevries89.wordpress.com/2015/03/27/tied-biases-vs-untied-biases/.

OTHER TIPS

When i tried to output my CNN weights from theano's grapgh, i got one bias vector for each layer.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top