Pergunta

Can I just take all training data that I have, train the base models on them and then take their results and use them for training level 2 model? Is this a good practice, or should it be done differently?

Foi útil?

Solução

You can do that, but your model will not generalize well. You should not use base-model predictions from data, which were used to fit the base model. Thus, you have to get the base model predictions for the training data using cross-validation. This is called "model stacking".

This page has a good explanation:

  1. Split your training data into subsets, predict the target for each subset using all other subsets.

enter image description here

  1. Fit the base model on the whole training data and predict the target for the test set.

enter image description here

  1. Do this for multiple base models. Now you have train and test set predictions for each base model. In this example we have two base models: enter image description here

  2. Fit an ensemble model on the base training predictions and evaluate the performance on the base test predictions.

Licenciado em: CC-BY-SA com atribuição
scroll top