How to prepare/augment images for neural network?

https://datascience.stackexchange.com/questions/5224

16-10-2019
|

Question

I would like to use a neural network for image classification. I'll start with pre-trained CaffeNet and train it for my application.

How should I prepare the input images?

In this case, all the images are of the same object but with variations (think: quality control). They are at somewhat different scales/resolutions/distances/lighting conditions (and in many cases I don't know the scale). Also, in each image there is an area (known) around the object of interest that should be ignored by the network.

I could (for example) crop the center of each image, which is guaranteed to contain a portion of the object of interest and none of the ignored area; but that seems like it would throw away information, and also the results wouldn't be really the same scale (maybe 1.5x variation).

Dataset augmentation

I've heard of creating more training data by random crop/mirror/etc, is there a standard method for this? Any results on how much improvement it produces to classifier accuracy?

Solution

The idea with Neural Networks is that they need little pre-processing since the heavy lifting is done by the algorithm which is the one in charge of learning the features.

The winners of the Data Science Bowl 2015 have a great write-up regarding their approach, so most of this answer's content was taken from: Classifying plankton with deep neural networks. I suggest you read it, specially the part about Pre-processing and data augmentation.

- Resize Images

As for different sizes, resolutions or distances you can do the following. You can simply rescale the largest side of each image to a fixed length.

Another option is to use openCV or scipy. and this will resize the image to have 100 cols (width) and 50 rows (height):

resized_image = cv2.resize(image, (100, 50))

Yet another option is to use scipy module, by using:

small = scipy.misc.imresize(image, 0.5)

- Data Augmentation

Data Augmentation always improves performance though the amount depends on the dataset. If you want to augmented the data to artificially increase the size of the dataset you can do the following if the case applies (it wouldn't apply if for example were images of houses or people where if you rotate them 180degrees they would lose all information but not if you flip them like a mirror does):

rotation: random with angle between 0° and 360° (uniform)
translation: random with shift between -10 and 10 pixels (uniform)
rescaling: random with scale factor between 1/1.6 and 1.6 (log-uniform)
flipping: yes or no (bernoulli)
shearing: random with angle between -20° and 20° (uniform)
stretching: random with stretch factor between 1/1.3 and 1.3 (log-uniform)

You can see the results on the Data Science bowl images.

Pre-processed images

augmented versions of the same images

-Other techniques

These will deal with other image properties like lighting and are already related to the main algorithm more like a simple pre-processing step. Check the full list on: UFLDL Tutorial

OTHER TIPS

While wacax's answer is complete and really explanatory, I would like to add a couple of things in case anyone stumbles on this answer.

First of all, most scipy.misc image related functions (imread, imsave, imresize erc) have become deprecated in favor of either imageio or skimage.

Secondly, I would strongly recommend the python library imgaug for any augmentation task. It is really easy to use and has virtually all augmentation techniques you might want to use.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange