How do CNNs use a model and find the object(s) desired?

https://datascience.stackexchange.com/questions/15236

16-10-2019
|

Question

Background: I'm studying CNN's outside of my undergraduate CS course on ML. I have a few questions related to CNNs.

1) When training a CNN, we desire tightly bounded/cropped images of the desired classes, correct? I.e. if we were trying to recognize dogs, we would use thousands of images of tightly cropped dogs. We would also feed images of non-dogs, correct? These images are scaled to a specific size, i.e. 255x255.

2) Let's say training is complete. Our model's accuracy seems sufficient, with no problems. From here, let's have a large, HD image of a non-occluded dog running through a field with various obstacles. With a typical NN and some data, we just take the model, cross it with some input, and bam it's going to output some class. How will the CNN view this large image, and then 'find' the dog? Do we run some type of preprocessing on the image to partition it, and feed the partitions?

Solution

Though there can be a very detailed explanation for this question but I will try to make you understand much minimal words.

1) Cropping the images to a particular size isn't a necessary condition and neither is scaling. But put this way, it doesn't matter whether a dog is represented in a B&W image or RGB image because a convolution network learns features in the images which are independent of colors. Scaling and resizing help to limit the value of pixels between 0 and 1.

2) Once you have trained your CNN model, it has learned all the features like edges,etc. to recognize a dog in the image. Because the model has learned the features, it acquires certain properties like translation invariance which means that no matter where you position a dog in the image, it's still a dog and have the same features. How the model recognize it? It checks for the features of a dog, learned during training, no matter what the size of the new image is or where the dog is in the image or what the dog is doing.

For getting a in-depth understanding you can refer to the following resources:

http://neuralnetworksanddeeplearning.com/chap6.html

http://cs231n.github.io/convolutional-networks/

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange