Question

I'm reading a book titled Python Deep Learning, and in Convolutional layers in deep learning on the chapter 5, the following is written:

One more important point to make is that convolutional networks should generally have a depth equal to a number which is iteratively divisible by 2, such as 32, 64, 96, 128, and so on. This is important when using pooling layers, such as the max-pool layer, since the pooling layer (if it has size (2,2)) will divide the size of the input layer, similarly to how we should define "stride" and "padding" so that the output image will have integer dimensions. In addition, padding can be added to ensure that the output image size is the same as the input.

As far as I know, the width and the height should be a figure divisible by 2, in order to use a pooling layer. However, I don't understand why the depth must be a figure divisible by 2. The pooling layer just operates on a 2-dimensional screen based on width and height, and it operates on each filter (depth) separately, right?

Why should the depth also be set to a figure divisible by 2?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top