How do stacked CNN layers work?

https://datascience.stackexchange.com/questions/77830

12-12-2020
|

Frage

The internet is full of pictures like this:

But how are the second/third/etc CNN layers able to extract features when the features are already extracted by the previous layers?

For example, the mid-level feature in the picture has a nose. When we apply this "nose" filter, the output feature map will be an image without the nose, right? Then, this feature map is passed to the next CNN layer, but how is it able to extract the "high-level feature" if the feature map given to it doesn't contain the nose? And the more layers we stack in CNN, the less meaningful data will be extracted in the latter layers.

Lösung

I think you might be misunderstanding the phrase "extract" here. Think of it as "gets activated by" instead.

For example, the "nose filter" gets activated by inputs which look like a human nose (more precisely: it gets activated by activation maps of previous layers which correspond to a nose in the input image). And simply put, the following higher level feature maps in your example, which might encode a human face, then get activated if the previous layers contain activated feature maps for a human nose, mouth, eye etc. This article explains it well.

As NNs in general, this process is loosely inspired by how visual perception in humans and other animals works.

Andere Tipps

In the picture above First layer learns to detect only different edges. Then in the second layer it uses those edge to build a nose eye etc. Think of multiple lines from first layer together creating a nose structure. Then in the final layer using nose eyes lips it build a face like structure. That's how it works. First layer will learn edges and some of the edges will form a nose which will be detected in the second layer then nose eyes like structure will form a face which will be detected in the last layer.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit datascience.stackexchange