문제

I'm studying right now Convolutional Neural Networks. Why a CNN must have fixed input?

I know that it's possible to overcome this problem (with fully convolutional neural networks ecc...), and i also know that is due to the fully connected layers placed at the end of the network.

But why? I can't understand what does the presence of the Fully Connected layers imply and why force us to have fixed input size

도움이 되었습니까?

해결책

I think the answer to this question is weight sharing in convolutional layers, which you don't have in fully-connected ones. In convolutional layers you only train the kernel, which is then convolved with the input of that layer. If you make the input larger, you still would use the same kernel, only the size of the output would also increase accordingly. The same is true for pooling layers.

So, for convolutional layers the number of trainable weights is (mostly) independent of input and output size, but output size is determined by input size and vice versa.

In fully-connected layers you train weight to connect every dimension of the input with every dimension of the output, so if you made the input larger, you would require more weights. But you cannot just make up new weights, they would need to be trained.

So, for fully-connected layers the weight matrix determines both input and output size.

Since CNN often have one or more fully-connected layers in the end, there is a constraint on what the input dimension to the fully-connected layers has to be, which in turn determines the input size of the highest convolutional layer, which in turn determines the input size of the second highest convolutional layer and so on and so on, until you reach the input layer.

다른 팁

It's actually not true. CNN's don't have to have a fixed-size input. It is possible to build CNN architectures that can handle variable-length inputs. Most standard CNNs are designed for a fixed-size input, because they contain elements of their architecture that don't generalize well to other sizes, but this is not inherent.

For example, standard CNN architectures often use many convolutional layers followed by a few fully connected layers. The fully connected layer requires a fixed-length input; if you trained a fully connected layer on inputs of size 100, and then there's no obvious way to handle an input of size 200, because you only have weights for 100 inputs and it's not clear what weights to use for 200 inputs.

That said, the convolutional layers themselves can be used on variable-length inputs. A convolutional layer has a convolutional kernel of fixed size (say, 3x3) that is applied to the entire input image. The training process learns this kernel; the weights you learn determine the kernel. Once you've learned the kernel, it can be used on an image of any size. So the convolutional layers can adapt to arbitrary-sized inputs. It's when you follow a convolutional layer with a fully connected layer that you get into trouble with variable-size inputs.

You might be wondering, if we used a fully convolutional network (i.e., only convolutional layers and nothing else), could we then handle variable-length inputs? Unfortunately, it's not quite that easy. We typically need to produce a fixed-length output (e.g., one output per class). So, we will need some layer somewhere that maps a variable-length input to a fixed-length output.

Fortunately, there are methods in the literature for doing that. Thus, it is possible to build networks that can handle variable-length inputs. For instance, you can train and test on images of multiple sizes; or train on images of one size and test on images of another size. For more information on those architectures, see e.g., https://stackoverflow.com/q/36262860/781723, https://stats.stackexchange.com/q/250258/2921, https://stackoverflow.com/q/57421842/781723, https://stackoverflow.com/q/53841509/781723, https://stackoverflow.com/q/53114882/781723, https://docs.fast.ai/layers.html#AdaptiveConcatPool2d, and so on.

That said, these methods are not yet as widely used as they could be. Many common neural network architectures don't use these methods, perhaps because it is easier to resize images to a fixed size and not worry about this, or perhaps because of historical inertia.

Input size determines the overall number of parameters of the Neural Network. During training, each parameter of the model specializes to "learn" some part of the signal. This implies that once you change the number of parameters, the whole model must be retrained. That's why we can't afford to let the input shape change.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top