There are two ways:
First:
While taking a x_train and x_test split. You can take a 0.1 split from x_train and keep it for validation x_dev:
x_train, x_test, y_train, y_test = train_test_split(data_x, data_y, test_size=0.25)
x_train, x_dev, y_train, y_dev = train_test_split(x_train, y_train, test_size=0.1)
clf = GridSearchCV(YourEstimator(), param_grid=param_grid,)
clf.fit(x_train, y_train, x_dev, y_dev)
And your estimator will look like the following and implement early stopping with x_dev, y_dev
class YourEstimator(BaseEstimator, ClassifierMixin):
def __init__(self, param1, param2):
# perform initialization
#
def fit(self, x, y, x_dev=None, y_dev=None):
# perform training with early stopping
#
Second
You would not perform the second split on x_train, but would take out the dev set in the fit method of the Estimator
x_train, x_test, y_train, y_test = train_test_split(data_x, data_y, test_size=0.25)
clf = GridSearchCV(YourEstimator(), param_grid=param_grid)
clf.fit(x_train, y_train)
And your estimator will look like the following:
class YourEstimator(BaseEstimator, ClassifierMixin):
def __init__(self, param1, param2):
# perform initialization
#
def fit(self, x, y):
# perform training with early stopping
x_train, x_dev, y_train, y_dev = train_test_split(x, y,
test_size=0.1)