Question

I am doing image classification with a CNN and I am having trouble building a network that does not do overfitting. I have in my training set 2000 images of 4 classes, while in my test set I have 3038 of the same 4 classes. My CNN is the following:

def Network(input_shape, num_classes, regl2 = 0.0001, lr=0.0001):

model = Sequential()

# C1 Convolutional Layer 
model.add(Conv2D(filters=32, input_shape=input_shape, kernel_size=(3,3),\
                 strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation before passing it to the next layer
model.add(BatchNormalization())

# C2 Convolutional Layer
model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Batch Normalisation
model.add(BatchNormalization())

# C3 Convolutional Layer
model.add(Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Batch Normalisation
model.add(BatchNormalization())

# C4 Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
#Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())

# C5 Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())

 # C6 Convolutional Layer
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())

 # C7 Convolutional Layer
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())

# Flatten
model.add(Flatten())

flatten_shape = (input_shape[0]*input_shape[1]*input_shape[2],)

# D1 Dense Layer
model.add(Dense(4096, input_shape=flatten_shape, kernel_regularizer=regularizers.l2(regl2)))
model.add(Activation('relu'))
# Dropout
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())

# D2 Dense Layer
model.add(Dense(4096, kernel_regularizer=regularizers.l2(regl2)))
model.add(Activation('relu'))
# Dropout
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())

# D3 Dense Layer
model.add(Dense(1000,kernel_regularizer=regularizers.l2(regl2)))
model.add(Activation('relu'))
# Dropout
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())

# Output Layer
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# Compile

adam = optimizers.Adam(lr=lr)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

return model

and everytime I train and test I clearly overfit, because if I test the model I obtain a low accuracy, around 45%, and the curves of accuracy for test and training are really far apart if I plot them.

How could I improve my network in such a way it does not overfits?

Thanks in advance.

Was it helpful?

Solution

If the model is overfitting you can either increase regularization or simplify the model, as already suggested by @Oxbowerce: remove some of the convolutions and/or maybe reduce the dense layers.

Given that you already have several different types of regularizers present, I can suggest another one for convolutional layers: spatial dropout. By using SpatialDropout2D available in Keras, you can drop entire features from convolutional layers. You can try using it after the first convolutions after C5,C6 and C7 for starters.

But given that you have a very small dataset, I am not sure if there is a lot of room for improvement. It would be easier to comment if you also share the training and test accuracy graphs directly. Anyway, the best way to approach image recognition problems with small datasets is via transfer learning. I suggest you look into it if you do not have to build the model from scratch for some other reason.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top