Question

This questions is very specific to the Python library scikit-learn. Please let me know if it's a better idea to post it somewhere else. Thanks!

Now the question...

I have a feed-forward neural network class ffnn based on BaseEstimator which I train with SGD. It's working fine, and I can also train it in parallel using GridSearchCV().

Now I want to implement early stopping in the function ffnn.fit() but for this I also need access to the validation data of the fold. One way of doing this is to change the line in sklearn.grid_search.fit_grid_point() which says

clf.fit(X_train, y_train, **fit_params)

into something like

clf.fit(X_train, y_train, X_test, y_test, **fit_params)

and also change ffnn.fit() to take these arguments. This would also affect other classifiers in sklearn, which is a problem. I can avoid this by checking for some kind of a flag in fit_grid_point() which tells me when to call clf.fit() in either of the above two ways.

Can someone suggest a different way to do this where I don't have to edit any code in the sklearn library?

Alternatively, would it be right to further split X_train and y_train into train/validation sets randomly and check for a good stopping point, then re-train the model on all of X_train?

Thanks!

Was it helpful?

Solution

You could just make you neural network model internally extract a validation set from the passed X_train and y_train by using the train_test_split function for instance.

Edit:

Alternatively, would it be right to further split X_train and y_train into train/validation sets randomly and check for a good stopping point, then re-train the model on all of X_train?

Yes but that would be expensive. You could just find the stopping point and then just a do a single additional pass over the validation data that you used to find the stopping point.

OTHER TIPS

There are two ways:

First:

While taking a x_train and x_test split. You can take a 0.1 split from x_train and keep it for validation x_dev:

x_train, x_test, y_train, y_test = train_test_split(data_x, data_y, test_size=0.25)

x_train, x_dev, y_train, y_dev = train_test_split(x_train, y_train, test_size=0.1)

clf = GridSearchCV(YourEstimator(), param_grid=param_grid,)
clf.fit(x_train, y_train, x_dev, y_dev)

And your estimator will look like the following and implement early stopping with x_dev, y_dev

class YourEstimator(BaseEstimator, ClassifierMixin):
    def __init__(self, param1, param2):
        # perform initialization
        #

    def fit(self, x, y, x_dev=None, y_dev=None):
        # perform training with early stopping
        #

Second

You would not perform the second split on x_train, but would take out the dev set in the fit method of the Estimator

x_train, x_test, y_train, y_test = train_test_split(data_x, data_y, test_size=0.25)

clf = GridSearchCV(YourEstimator(), param_grid=param_grid)
clf.fit(x_train, y_train)

And your estimator will look like the following:

class YourEstimator(BaseEstimator, ClassifierMixin):
    def __init__(self, param1, param2):
        # perform initialization
        #

    def fit(self, x, y):
        # perform training with early stopping
        x_train, x_dev, y_train, y_dev = train_test_split(x, y, 
                                                         test_size=0.1)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top