Question

I have seen many examples online regarding the MNIST dataset, but it's all in black and white. In that case, a 2D array can be constructed where the values at each array element represent the intensity of the corresponding pixel. However, what if I want to do colored images? What's the best way to represent the RGB data?

There's a very brief discussion of it here, which I quote below. However, I still don't get how the RGB data should be organized. Additionally, is there some OpenCV library/command we should use to preprocess the colored images?

the feature detectors in the second convolutional-pooling layer have access to all the features from the previous layer, but only within their particular local receptive field*

*This issue would have arisen in the first layer if the input images were in color. In that case we'd have 3 input features for each pixel, corresponding to red, green and blue channels in the input image. So we'd allow the feature detectors to have access to all color information, but only within a given local receptive field.

Was it helpful?

Solution

Your R,G, and B pixel values can be broken into 3 separate channels (and in most cases this is done for you). These channels are treated no differently than feature maps in higher levels of the network. Convolution extends naturally to more than 2 dimensions.

Imagine the greyscale, single-channel example. Say you have N feature maps to learn in the first layer. Then the output of this layer (and therefore the input to the second layer) will be comprised of N channels, each of which is the result of convolving a feature map with each window in your image. Having 3 channels in your first layer is no different.

This tutorial does a nice job on convolution in general.

http://deeplearning.net/tutorial/lenet.html

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top