What should I use as training data for base (level 1) classifiers in ensembling?

https://datascience.stackexchange.com/questions/66370

20-10-2020
|

Pergunta

Can I just take all training data that I have, train the base models on them and then take their results and use them for training level 2 model? Is this a good practice, or should it be done differently?

Solução

You can do that, but your model will not generalize well. You should not use base-model predictions from data, which were used to fit the base model. Thus, you have to get the base model predictions for the training data using cross-validation. This is called "model stacking".

This page has a good explanation:

Split your training data into subsets, predict the target for each subset using all other subsets.

Fit the base model on the whole training data and predict the target for the test set.

Do this for multiple base models. Now you have train and test set predictions for each base model. In this example we have two base models:
Fit an ensemble model on the base training predictions and evaluate the performance on the base test predictions.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange