Opinions on an LSTM hyper-parameter tuning process I am using

https://datascience.stackexchange.com/questions/73605

11-12-2020
|

Pergunta

I am training an LSTM to predict a price chart. I am using Bayesian optimization to speed things slightly since I have a large number of hyperparameters and only my CPU as a resource.

Making 100 iterations from the hyperparameter space and 100 epochs for each when training is still taking too much time to find a decent set of hyperparameters.

My idea is this. If I only train for one epoch during the Bayesian optimization, is that still a good enough indicator of the best loss overall? This will speed up the hyperparameter optimization quite a bit and later I can afford to re-train the best 2 or 3 hyperparameter sets with 100 epochs. Is this a good approach?

The other option is to leave 100 epochs for each training but decrease the no. of iterations. i.e. decrease the number of training with different hyperparameters.

Any opinions and/or tips on the above two solutions?

( I am using keras for the training and hyperopt for the Bayesian optimisation)

Solução

First of all you might want to know there is a "new" Keras tuner, which includes BayesianOptimization, so building an LSTM with keras and optimizing its hyperparams is completely a plug-in task with keras tuner :) You can find a recent answer I posted about tuning an LSTM for time series with keras tuner here

So, 2 points I would consider:

I would not loop only once over your dataset, it does not sound like enough times to find the right weights. I would rather control the number of possible hyperparams configurations as you said, which is something you can indicate in keras tuner via max_trials param

About using keras tuner with Bayesian tuner, you can find some code below as an example for tuning the units (nodes) in the hidden layers and the learning rate:

from tensorflow import keras
from kerastuner.tuners import BayesianOptimization

n_input = 6
def build_model(hp):
    model = Sequential()
    model.add(LSTM(units=hp.Int('units',min_value=32,
                                    max_value=512,
                                    step=32), 
               activation='relu', input_shape=(n_input, 1)))
    model.add(Dense(units=hp.Int('units',min_value=32,
                                    max_value=512,
                                    step=32), activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mse', metrics=['mse'], optimizer=keras.optimizers.Adam(
        hp.Choice('learning_rate',
                  values=[1e-2, 1e-3, 1e-4])))

return model

bayesian_opt_tuner = BayesianOptimization(
    build_model,
    objective='mse',
    max_trials=3,
    executions_per_trial=1,
    directory=os.path.normpath('C:/keras_tuning'),
    project_name='kerastuner_bayesian_poc',
    overwrite=True)

bayesian_opt_tuner.search(train_x, train_y,epochs=n_epochs,
     #validation_data=(X_test, y_test)
     validation_split=0.2,verbose=1)


bayes_opt_model_best_model = bayesian_opt_tuner.get_best_models(num_models=1)
model = bayes_opt_model_best_model[0]

You would get something like this, informing you about the searched configurations and evaluation metrics:

Outras dicas

Here you can find the code to train an LSTM via keras and tune it via keras tuner, bayesian option:

#2 epoch con 20 max_trials
from kerastuner import BayesianOptimization

def build_model(hp):
    model = keras.Sequential()
    model.add(keras.layers.LSTM(units=hp.Int('units',min_value=8,
                                        max_value=64,
                                        step=8), 
                   activation='relu', input_shape=x_train_uni.shape[-2:]))
    model.add(keras.layers.Dense(1))

    model.compile(loss='mae', optimizer=keras.optimizers.Adam(
            hp.Choice('learning_rate',
                      values=[1e-2, 1e-3, 1e-4])),
                   metrics=['mae'])
    return model

# define model
bayesian_opt_tuner = BayesianOptimization(
    build_model,
    objective='mae',
    max_trials=20,
    executions_per_trial=1,
    directory=os.path.normpath('C:/keras_tuning'),
    project_name='timeseries_temp_ts_test_from_TF_ex',
    overwrite=True)

EVALUATION_INTERVAL = 200
EPOCHS = 2

bayesian_opt_tuner.search(train_univariate, #X_train, y_train,
             epochs=EPOCHS,
             validation_data=val_univariate,
             validation_steps=50,
             steps_per_epoch=EVALUATION_INTERVAL
             #batch_size=int(len(X_train)/2)
             #validation_split=0.2,verbose=1)
             )

I did it with a temperatures dataset, changing both epochs and hyperparams combinations. I think it depends also on the dataset you are playing with, for the one I quickly tried (with no representative results since it should be repeated enough times to get results distributions for each case), I saw no much difference (we should check it via a hypothesis tester for a robust conclusion), but there you can play with it. My quick results:

20 epochs, 2 hyperparams combinations:

2 epochs, 20 hyperparams combinations:

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange