Question

I have used Stratified K fold for learning the model . Below is the python code:

>def stratified_cv_v1(X, y, clf, shuffle=True, n=10,):
>    stratified_k_fold = StratifiedKFold(n_splits=n,shuffle=shuffle)
>    y_pred_v1 = y.copy()
>    for ii, jj in stratified_k_fold.split(X,y): 
>        X_train, X_test = X[ii], X[jj]
>        y_train = y[ii]
>        clf_v2 = clf()
>        clf_v2.fit(X_train,y_train)
>        y_pred[jj] = clf.predict(X_test)
>    return y_pred_v1


>print(classification_report(y, stratified_cv_v1(X, y, GradientBoostingClassifier)))

Now how do I use the model to deploy on a new data set where I need to predict ?

Was it helpful?

Solution

k-fold CV is meant to evaluate the model. Once the evaluation is done and one is ready to move to deployment, there's no point using CV anymore: the method has been tested and validated, so one can reasonably assume that from now on applying the same method to the same kind of data will lead to the same level of performance. Thus the usual process is:

  1. Train a final model on the full dataset (no CV, no testing)
  2. Apply the model to new instances
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top