What should I use as training data for base (level 1) classifiers in ensembling?
-
20-10-2020 - |
Pergunta
Can I just take all training data that I have, train the base models on them and then take their results and use them for training level 2 model? Is this a good practice, or should it be done differently?
Solução
You can do that, but your model will not generalize well. You should not use base-model predictions from data, which were used to fit the base model. Thus, you have to get the base model predictions for the training data using cross-validation. This is called "model stacking".
This page has a good explanation:
- Split your training data into subsets, predict the target for each subset using all other subsets.
- Fit the base model on the whole training data and predict the target for the test set.
Licenciado em: CC-BY-SA com atribuição
Não afiliado a datascience.stackexchange