Adapting ZFNet on 2244x224 image using a filter 7X7

https://datascience.stackexchange.com/questions/86640

17-12-2020
|

Question

I am building a model based on ZFNet in Tensorflow 2.0. I am using the Petal images dataset. The images are of size 224x244x3. So my question is when implementing the first layer (conv2d) with filter size = 7 and a stride of 3 and padding of 0. I am getting the output dimension of 109.5 using formula (n+2p-f/S + 1). So if I use the above-mentioned values what will be the dimension returned by TensorFlow in the first layer. and secondly, how can I adjust the parameter values so it returns a whole number.

reference formula : (n+2p-f)/2 +1

reference calculations: 224+0-7/2 +1 = 109.5

Thanks.

Solution

As per the formula for the feature map dimension:

$$feature_{dim} = \frac{n+2p-f}{S} + 1$$

The values for :

n = 224
p = 0
f = 7
s = 3

$$feature_{dim} = \frac{224+2*0-7}{3} + 1 = 73.66 $$

As you've guessed this is not the size of the feature map.

Tensorflow takes it as 73.

If you're relying on the formula, you are missing out on a concept, that this should a process were in the kernel slides over the Input hence the feature map dimension should be an integer. So what happens is that the kernel while sliding with a stride of 3 leaves out the last few pixels and won't reach the other edge.

If you're trying to get a feature map that's an integer by keeping a constant filter size of 7. Then you're stride is :

$$S = 217 /(N - 1)$$

where N is you're desired output size.

If you choose N to be 8 or 32, you'll end up with a stride of 7 or 31. It's better to choose S = 7 to get most of the information. But still, it doesn't matter as TensorFlow has checks for the same to prevent errors.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange