Question

I am evaluating a neural network model using cross validation in 2 different ways ( A & B ) that I thought were equivalent.

  • Evaluation type A : For each cross validation loop, the model is instantiated and fitted.
  • Evaluation type B : I instantiate the model once and then that instantiated model is fitted for each loop of the cross validation procedure.

I am using the metric mean absolute error (MAE).

Question: Why do I get a continuously decreasing MAE over cross-validation loops when using type B evaluation and not when using type A evaluation?

Code and details

First I generate synthetic data :

from sklearn.datasets import make_regression

X , y = make_regression( n_samples = 1000 , n_features = 10 , n_informative = 5 , n_targets = 1 , random_state = 2 )

I then define a function to get a model ( neural network ) :

from keras.models import Sequential
from keras.layers import Dense

def get_model( n_nodes_hidden_layer , n_inputs , n_outputs ) :
   model = Sequential()
   model.add( Dense( n_nodes_hidden_layer , input_dim = n_inputs , kernel_initializer = 'he_uniform' , activation = 'relu' ) )
   model.add( Dense( n_outputs ) )
   model.compile( loss = 'mae' , optimizer = 'adam' )
   return model

After that I define 2 evaluation functions using :

from sklearn.model_selection import RepeatedKFold
from sklearn.metrics import mean_absolute_error

Type A evaluation function :

def evaluate_model_A( X , y ) :
   results = list()
   cv = RepeatedKFold( n_splits = 10 , n_repeats = 1 , random_state = 999 )

   for train_ix, test_ix in cv.split( X ) :
     X_train, X_test = X[ train_ix ] , X[ test_ix ]
     y_train, y_test = y[ train_ix ] , y[ test_ix ]

     model = get_model( 20 , 10 , 1 )
     model.fit( X_train , y_train , epochs = 100 , verbose = 0 )

     y_test_pred = model.predict( X_test )
     mae = mean_absolute_error( y_test , y_test_pred )
     results.append( mae )

     print( f'mae : {mae}' )

   return results

Type B evaluation function :

 def evaluate_model_B( model , X , y ) :
   results = list()
   cv = RepeatedKFold( n_splits = 10 , n_repeats = 1 , random_state = 999 )

   for train_ix, test_ix in cv.split( X ) :
     X_train, X_test = X[ train_ix ] , X[ test_ix ]
     y_train, y_test = y[ train_ix ] , y[ test_ix ]

     model.fit( X_train , y_train , epochs = 100 , verbose = 0 )

     y_test_pred = model.predict( X_test )
     mae = mean_absolute_error( y_test , y_test_pred )
     results.append( mae )

     print( f'mae : {mae}' )

   return results

Before using type B evaluation function I need to instantiate the model because it is an argument of the function :

model = get_model( 20 , 10 , 1 )

What I do not understand is the fact that while using type B evaluation function the MAE is decreasing for each cross validation loop which is not the case with type A evaluation function.

Is this specific to neural networks?

Note : when I am using a RandomForestRegressor() the phenomenon does not show up.

Was it helpful?

Solution

In the evaluation type B approach, your neural network weights and biais are not reset before each loop of cross-validation. The neural network is then learning from one loop to another, so you see the MAE continuously decreasing.

A solution is to store your weights and biais before fitting the model and load them at each loop so they have the same init.

You can use those methods to do so:

model.save_weights('model.h5') # right after model instantiation
model.load_weights('model.h5') # in the loop before fitting

In evaluation type A, because you instantiate the model in the loop, weights and biais are reset so you don't see the phenomenon.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top