Validation accuracy is always close to training accuracy

https://datascience.stackexchange.com/questions/42606

01-11-2019
|

Pergunta

I am trying to tune the hyperparameters of a LSTM I have to do time series forecasting. I have noticed that my validation accuracy is always very close to my training accuracy. I am not sure whether or not this is good or bad or what it implies in general?

At the moment I have kept all the hyperparameters the same and have only varied the number of units in the LSTM layer from [1, 5, 26].

I expected 26 units to give me good results and have since added units 1 and 5 to help me investigate. I was also expecting to see that my validation accuracies are worse than my training accuracies but this does not seem to be the case. My validation accuracies track the training accuracies very well. Is this something to be concerned about?

The plot below shows the average loss, average training and validation accuracies.

You can see from the plot that each of the units training and validation accuracies stay very close together. Why is this? Does this mean that in general my model should generalise well on unseen data as there isn't much of a discrepancy?

Obviously the accuracies of the model are not very good yet in general and it requires some tuning but to do the tuning process I was more expecting to see some differences between the training and validation data and not that they would stay largely similar to each other.

EDIT:

Further information as requested. The hyperparameters used for each the model can be seen in the legend of the plots. Here is an updated plot which shows a greater variation in accuracy from varying the number of neurons and batch size:

From the plot it seems as if the model accuracy converges the quickest with a batch size of 1 and 39 neurons. However it is characteristic amongst all that the training and validation accuracies track each other closely. I hadn't expected that varying two hyperparameters independently would lead to a consistent result like that.

In this problem I am working with, the model is to provide forecasts for one time series. I am using it as a 'toy' problem to try and learn what kind of models work well for my particular problem.

I have 315 data points in total for the time series. I have left the last 52 out as a hold out test set which I haven't ever looked at to this point. 52 points as I am making weekly predictions. I then left the next last 52 points out to use as a validation set. Due to the time dependency I am unable to use a validation method such as KFold cross validation. Therefore I am using something called rolling origin analysis which is explained well here (see under predictive performance). this simulates well how the model would be used in practice and my accuracies are measurements using a modified sMAPE formula for a multi-step forecast.

Essentially what this means is that I train (in this case) 27 separate models. I have a forecast horizon of 26, so what I do is take my train data and take the first 26 points of my validation set.

I train my first model on the train data (which is 131 samples) and this is validated against 1 sample.

I train the next model which uses the same training data except it is shifted one along so the very first point from train is dropped and the first point from the validation set is appended to the end of the train set. The validation set is then also shifted along by one. As a quick example (numbers represent indices of the time series):

Model1 train: [1, 2, 3, 4, 5], val: [6, 7, 8]

Model2 train: [2, 3, 4, 5, 6], val: [7, 8, 9]

The average accuracy shown in the plots is therefore the average of the accuracies that each model computed against its own validation set.

So each model is trained on the same amount of data and each model is completely fitted anew, that is Model1 doesn't share any information with Model2.

The accuracies are computed in this way because I do not have much data and this is the best way I have found that allows me to get more than two validation sets and more than two test sets.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange