문제

I was reading the paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks and couldn't get my head around this sentence:

Intuitively, the compound scaling method makes sense because if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.

  1. In the case of a big image, why the network needs more layers to increase the receptive field ? What does increasing the receptive field mean? Increasing its width/height ? If so, we can do it directly without increasing the number of layer in the network no ?

  2. is "fine-grained patterns" referring to noisy shape we can see after visualizing convolution output ?

I feel like I am missing / misunderstanding something evident.

도움이 되었습니까?

해결책

Receptive field refers to the number of input pixels that a convolutional filter will operate on. There's a nice distill article about how to calculate receptive field size for your filters (with a nice visualization of receptive field size) and an interactive calculator here if you're only curious about how receptive field size grows with changes to depth and filter size.

Increases to receptive field size typically come from adding layers and from increasing the kernel size. A larger kernel operates on more pixels, which grows the receptive field. Increasing the depth of your network refers to adding additional convolutional layers. These downstream filters operate on the feature maps produced by the initial conv. layers of your net, which increases the receptive field for the filters in those additional layers (if this isn't clear, this is a good guide). The distill article also goes into detail about how other operations affect receptive field size.

As for the claim of a gain in number fine grain patterns captured, it's more in line with the intuition that more filters will give the network more ways to learn specific features of your data. See articles around visualizing convolutional filters for a sense of what types of features are captured (this tutorial on object detection links to a nice visualization).

Hope this helps!

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top