Training set to train artificial neural network

Question 1

Some comments: Regarding normalization , it is useful, because it makes the picture less sensitive to the lighting (without normalization, the same apple in better lighting would look whiter, and the network would take different values as inputs. You do not want that). However, formula you use limits the values of a variable in an interval [a, b]. In a picture, the values of the digits are usually already in [0, 255], so a different kind is needed. The exact kind of normalization you need depends on the features you extract from the picture (one of the most common is histogram equalization)

However, your main problem is that you can't feed an image directly into the network. The network must take as input some vector describing the picture, not the picture itself (think of this: if you feed the picture into the network, it compares pixel-by-pixel. If the same apple is moved one pixel to the right, all the pixel values are different, although the picture is essentially the same).

Creating a such vector might very difficult, depending on how you want to use it. A simple (but limited) approach would be to crop the apples, take the histogram of each picture, and feed the vector describing the histogram of each picture into the network. In this way the network will most probably classify the images correctly. If this is a school project, or you are just getting started in image processing, try this. However, if you want to find images that just contain an apple somewhere inside the picture, it is much more complicated, and you should look into the tutorials of opencv about 2d feature extraction.

Question 2

For the second category, you need to use images not containing apples. Ideally, it should be a mix of images not containing apples, that is a collection of spaceships, football games, smiling faces, animal and anything else you can think of.

However, the real problem, which you don't mention at all, is how you will extract the features: If it is the first time you are trying this and think you will just feed the whole picture, pixel-by-pixel, into a neural network and get an answer, you are out of luck.

The most important process is how to create the vectors describing the image. That is, how to create a vector from an image, such that the vectors of images containing apples are very different from those of images not containing apples.

However, this is not trivial at all. In fact, is can be a very difficult task, depending on the nature of the images. If first class consists of images showing apples taking up most of the picture, and the second class show only bedrooms, just using the color histogram would work.

However, if images show random scenes which may contain an apple or not, it is very difficult. To get started take a look at the opencv 2d features + homography to find a known object.