Some comments: Regarding normalization , it is useful, because it makes the picture less sensitive to the lighting (without normalization, the same apple in better lighting would look whiter, and the network would take different values as inputs. You do not want that). However, formula you use limits the values of a variable in an interval [a, b]. In a picture, the values of the digits are usually already in [0, 255], so a different kind is needed. The exact kind of normalization you need depends on the features you extract from the picture (one of the most common is histogram equalization)
However, your main problem is that you can't feed an image directly into the network. The network must take as input some vector describing the picture, not the picture itself (think of this: if you feed the picture into the network, it compares pixel-by-pixel. If the same apple is moved one pixel to the right, all the pixel values are different, although the picture is essentially the same).
Creating a such vector might very difficult, depending on how you want to use it. A simple (but limited) approach would be to crop the apples, take the histogram of each picture, and feed the vector describing the histogram of each picture into the network. In this way the network will most probably classify the images correctly. If this is a school project, or you are just getting started in image processing, try this. However, if you want to find images that just contain an apple somewhere inside the picture, it is much more complicated, and you should look into the tutorials of opencv about 2d feature extraction.