Loss Function - Decreasing a lot at beginning of the epoch

https://datascience.stackexchange.com/questions/74467

11-12-2020
|

Question

I noticed something when looking at the output of the verbose. When I train my model, in the early part of the epoch (first 20 %), the loss is decreasing a lot. And then in the rest of the epoch (last 80%), the loss is very stable and doesn't change that much until the next epoch. It does the same thing.

I build a model that is training a kind of large dataset (60000 entries). I am using Keras and Tensorflow and my model is just a simple regression model with conv2d and dense layers. I am trying to get the best loss function possible. The loss function that I am using is just a simple Mean Squared Error (MSE). I am also using Adam optimizer.

I don't understand how the loss is decreasing a lot at the beginning of the epoch and less for the rest of the training. Should I reduced the size of my database ? Is it considered overtraining then ?

Is there a way to speed up this process (like early stopping but just for the epoch) ? As I see it most of the training is done for the first part of the data. Maybe I am wrong.

Solution

Have you already plotted the training and validation loss over multiple epochs? What you would expect is something that looks like an exponential decay.

Chances are your models learns the mean of the target variable in the first few batches, which reduces the MSE-loss already considerably and only later learns the subtle differences in your data. You can try to use different batch sizes as well.

Are you generally happy with the performance on your model? Does it perform well on the test/validation set? If this is the case there might be no reason to be worried about.

OTHER TIPS

From your description, it seems that you are not shuffling your training data.

You should shuffle your data, and do it differently at every epoch. Once the data is shuffled, you should not see the behavior you describe.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange