Oscilations in loss curve [closed]

https://datascience.stackexchange.com/questions/86727

17-12-2020
|

Question

I saw a similar question, but I think my problem is something different. While training, the training loss and the validation loss move around one number, not decreasing significantly. I have 122707 training observations and 52589 test observations with 55 explanatory variables and one dependent, One CONN1D with 24 filters, 2 Lstm years with 24 units and one dense layer. I've added a dropout rate of 0.2 between the layers. Total parameters 13417. Seems like my model is not learning at all. Does it mean that the dataset is not a good representation of the specific problem? Should I increase the number of epochs? I use Adam optimizer with default learning rate.

Adding additional info: I am trying to predict next hour air pollution based on previous value air pollution concentration, previous hour meteorological data as temperature, windspeed etc. Day, Hour and Month are also included and encoded with One Hot Encoding. Additionaly the wind degree is decomposed according to its sin and cos components. Previously I tried normalization of data but it didn`t seem to give any difference. Haven't try any other models. Here is the model:

model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=24, kernel_size=3,
                  strides=1, padding="causal",
                  activation="relu",
                  input_shape=[None, 55]),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(24, return_sequences=True),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(24, return_sequences=True),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 1000)
])

Somewhere I saw Labda layer after Dense layer in regression. I noticed that adding Lambda layer at the end speeds up the learning. I multiply the output with 1000 because it is the maximum value for the variable I want to predict.

Solution

Rather than oscillations, it looks like white noise, like a random walk. In other words, as you said, your model is not learning anything.

Unfortunately it's impossible to say what's wrong, since we can't see any code. We need more information about dataset, how you processed it, model implementation, all the hyperparams you chose, what other versions you tried before that one, ... the list is countless. But most importantly it's really hard to help you without code.

If the dataset is a good one the problem must be some error you made along the way.

EDIT: Here's what I think:

You don't need Conv layers followed by RNN layers. This doesn't really make sense. Let the LSTM receive raw input.
Don't use Dropout with RNNs, they don't go along very well together. Dropout makes sense with Dense and Conv data, but in RNNs, where sequence is everythin, they can actually make things worse. Somebody uses recurrent dropout as an alternative but it's not necessary.
Don't use return_sequences=True between an LSTM and a Dense layer. That must be used between LSTM's only.
That Lambda layer at the end is probably causing most of the error. If you multiply all your predictions by 1000, what you get is by definition a prediction that on a completely different scale than your target value.
The Network overall is too deep and has too many parameters. I assume you are working with the famous Beijing air quality dataset. In this case, It is enought to work with one LSTM layer, followed by a Dense node to make the prediction. Everything else is overkill and not necessary for a simple dataset like that.

Something much more simple, like:

model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(24, input_shape=(seq_len, n_vars)),
    tf.keras.layers.Dense(1),
])

has higher chance to work. (Please specify the input shape correctly). Try playing with its hyperparameters, after you made sure all variables are properly scaled between train and test data.

Good luck!

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange