Data Augmentation recommended pipeline

https://datascience.stackexchange.com/questions/37621

31-10-2019
|

Question

I want to perform image classification using Keras and a dataset made of 50 classes. At the moment, I have only 7 images per class and I need to perform data augmentation in order to train the model and obtain acceptable accuracy values.

I am using the ImageDataGenerator class from keras which is recommended for image augmentation on the fly (during training). Since the classification is performing badly, I was wondering if it would be necessary to perform offline augmentation, i.e, enlarge the dataset before the training, because I honestly think that 7 is far from being a reasonable number of images per class.

Is it a common practice to perform both types of augmentation (before and during the training)? I am planning to use some 3rd party software or tools like imgaug to enlarge the dataset first and save the augmented images to disk and only then perform real-time augmentation with ImageDataGenerator class.

In conclusion, the flow would be similar to this:

Image pre-processing and offline data augmentation => enlarge the original dataset
Training with real-time augmentation => Load the dataset and use ImageDataGenerator

What do you think?

Thank you.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange