Pergunta

I have 2 instances of an object detection model. The only difference between these two models is the training data used:

  1. The first model was trained with a small training set
  2. The second model was trained on a larger training set than the first one

The first model was trained on the following hyperparameters:

  • Number of iterations: 250k
  • Batch Size: 10
  • Learning Rate: warms up to 0.001 and decreases to 0.0002 after 150k iterations

Since the second model has more training data, I assumed I need to change the hyperparameters a bit. So I tried training the second model on the following hyperparamters:

  • Number of iterations: 600k
  • Batch Size: 10
  • Learning Rate: warms up to 0.001 and decreases to 0.0002 after 400k iterations

When I measure the mAP for both models on a testing set, the first model vastly outperforms the second model.

model 1 mAP: 0.924

model 2 mAP: 0.776

This leads me to my question for you:

How would the hyperparameters (batch size, learning rate etc) change when the size of your training set increases? What factors need to be considered for this increase in training set size, in order to get the most optimal model possible?

Any and all responses will be greatly helpful. Thank you :)

Foi útil?

Solução

A major difference between the first and the second model you trained is the size of the data assuming that the model is not pretrained. Increased data, of course, needs increased epochs. According, the batch size must also increase.

Batch Size:

  • While training on the smaller dataset, a batch size of 10 yielded better results. The errors were averaged over 10 samples and then back-propagated through the model. But now for the larger dataset, the batch size remains the same and hence only little optimization occurs as the error is averaged over 10 samples only ( which is relatively smaller for a large dataset ).

Learning Rate:

  • For the larger dataset, the number of epochs is increased. The purpose of the learning rate is to scale the gradients of the loss with respect to the parameter. A smaller learning rate always helps as it prevents the overshooting of the minima of the loss function. I would insist you increase the learning rate so that the optimization does not diminish as we are having a larger number of epochs. Gradually decrease the learning rate, as the loss approaches its minima.

If you are training a popular architecture ( like Inception, VGG, etc. ) and that too on datasets like ImageNet, COCO with little modifications, you should definitely read the research papers published on various problems as they would provide a better start to the training.

Licenciado em: CC-BY-SA com atribuição
scroll top