Which combination of 3 hyperparameters to combat overfitting of a convolutional neural network?

https://datascience.stackexchange.com/questions/64834

20-10-2020
|

Pergunta

I have a small dataset with which I want to train a CNN by using Data Augmentation. Since the CNN is overfitting due to the small data set, I would like to optimize some hyperparameters. However, since I would like to use GridSearchCV from Scikit-Learn for this and I would therefore like to optimize only 3 hyperparameters due to reducing computational time. Here the question arises which combination of hyperparameters should I use for grid search?

My current approach would be to optimize the learning rate, the dropout layer rate and the number of epochs.

I choose the learning rate because the book "Deep Learning" by Goodfellow recommends to always optimize the learning rate. But I'm not sure if my combination of hyperparameters for tuning is really good.

What combination would you recommend? Many thank for every hint

My previous architectur is as follows:

model = Sequential()
model.add(Conv2D(32,(3,3))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(32,(3,3))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2))
model.add(Dropout(0.2))

model.add(Conv2D(64, (3,3))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(64,(3,3))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2))
model.add(Dropout(0.2))

model.add(Flatten())

model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.4))
model.add(Dense(10))
model.add(Activation('softmax'))

As optimizer I use Adam.

Solução

If you are asking specifically about overfitting, I would only keep dropout rate from your list of three. Other two can be any chosen from: number of filters in the convolutional layers, number of convolutions, number of dense layers (if any), number of neurons in dense layers.

Learning rate should be optimized but not for the purpose of combatting overfitting, at least the way I understand it. Using Keras, you can start with some learning rate - not necessarily very fine tuned, and slowly reduce it over time once the training reaches a plateau using Learning Rate Scheduler in Callbacks. Also, there are ways to find an optimal learning rate such as learning rate finder, so it would probably be a misuse of resources to optimize for learning rate using grid search (I never used them though!).

You can also use the Model Checkpoint present in the above Callbacks page to save the model as the validation loss improves only, and ignore the later epochs where overfitting becomes an issue. This way, you won't need to include number of epochs in your search.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange