Interpreting Loss in neural network: Neural network train loss gradually tappers and validation loss never reaches a minima

https://datascience.stackexchange.com/questions/75587

11-12-2020
|

Question

Unable to improve the network validation loss. Is it overfitting/underfitting. How can I get a better validation loss?.The code is below

def create_model(lr=0.05):
    #tf.random.set_seed(1)

    tf.keras.backend.clear_session()
    gc.collect()

    # Dense input
    dense_input = Input(shape=(len(dense_cols), ), name='dense1')

    # Embedding input
    #Turns positive integers (indexes) into dense vectors of fixed size.
    wday_input = Input(shape=(1,), name='wday')
    month_input = Input(shape=(1,), name='month')

    event_type_1_input = Input(shape=(1,), name='event_type_1')
    item_id_input = Input(shape=(1,), name='item_id')
    dept_id_input = Input(shape=(1,), name='dept_id')
    store_id_input = Input(shape=(1,), name='store_id')
    cat_id_input = Input(shape=(1,), name='cat_id')
    state_id_input = Input(shape=(1,), name='state_id')

    wday_emb = Flatten()(Embedding(7, 3)(wday_input))
    month_emb = Flatten()(Embedding(12, 2)(month_input))

    event_type_1_emb = Flatten()(Embedding(5, 1)(event_type_1_input))


    item_id_emb = Flatten()(Embedding(3049, 3)(item_id_input))
    dept_id_emb = Flatten()(Embedding(7, 1)(dept_id_input))
    store_id_emb = Flatten()(Embedding(10, 1)(store_id_input))
    cat_id_emb = Flatten()(Embedding(3, 1)(cat_id_input))
    state_id_emb = Flatten()(Embedding(3, 1)(state_id_input))

    # Combine dense and embedding parts and add dense layers. Exit on linear scale.
    x1 = concatenate([dense_input, 
                     event_type_1_emb, 
                     wday_emb ,
                     month_emb,
                     item_id_emb, dept_id_emb, store_id_emb,
                     cat_id_emb, state_id_emb])

    x = BatchNormalization()(x1)

    x = Dense(7142, activation=None,kernel_initializer='lecun_normal',kernel_regularizer= regularizers.l1_l2(0.001))(x)
    x = BatchNormalization()(x)
    x = Activation("selu")(x)
    x = AlphaDropout(0.30)(x)

    x = Dense(714, activation=None,kernel_initializer='lecun_normal',kernel_regularizer = regularizers.l2(0.001))(x)
    x = BatchNormalization()(x)
    x = Activation("selu")(x)
    x = AlphaDropout(0.3)(x)
    x = Dense(34, activation = None,kernel_initializer='lecun_normal',kernel_regularizer = regularizers.l2(0.001))(x)
    x = BatchNormalization()(x)
    x = Activation("selu")(x)
    x = Add()([x,x1])   
    outputs = Dense(1, activation="softplus", name='output',kernel_regularizer = regularizers.l2(0.001))(x)

    inputs = {"dense1": dense_input, "wday": wday_input, "month": month_input,# "year": year_input, 

             "event_type_1": event_type_1_input,

              "item_id": item_id_input, "dept_id": dept_id_input, "store_id": store_id_input, 
              "cat_id": cat_id_input, "state_id": state_id_input}

    # Connect input and output
    model = Model(inputs, outputs)



    model.compile(loss=keras.losses.mean_squared_error,
                  metrics=["mse","mape","mae"],
                  #optimizer=keras.optimizers.SGD(learning_rate=lr_schedule))
                  optimizer=keras.optimizers.RMSprop(learning_rate=lr))
    return model
```

Solution

this is a case of overfitting: your model performs very good just after few epochs but only in the training set, while on the validation set it improves too slowly.

There are several tricks that you can use:

1) a simpler network: you can lower the dimensions of the embeddig layers, as example. I also see some of the embeddings you are using can be avoided: the case of the month_emb, which I suppose stands for "month", and with have a dimensionality of 12. Is much better to use months as a simple number, or a categorical feature.

2) got more data: this is the hardest and often impossible way to proceed, but often is the best one.

3) reduce the number of inputs, by transforming/joining some of them.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange