Question

From what I have understood about model stacking: the meta estimator trains to combine the N-models predictions to fit the ground truth. Once trained, it combines the 1st level output to approach the ground truth.

The meta estimator is a model of type : $ (y_{pred1}, y_{pred2}, y_{pred3})\rightarrow y_{pred-stack}$

So the combination is only based on the values of 1st level predictions. However, each line of the stacking data is also linked to other attributes: "Brand", "Model", "Power". Why won't we take those attributes to determine the optimal combination? So if the model 1 is the best when the brand is "NaN", the meta will learn it and redirect every prediction having NaN brand to model 1.

So the meta estimator I propose is as follow : $ (y_{pred1},y_{pred2},y_{pred3},$brandIsNull$)\rightarrow y_{pred-stack}$

  • Does this approach exist?
  • If not, would it be a good or bad idea?
Was it helpful?

Solution

The rationale for stacking learners is to combine the strengths of the individual learners.

  • On the one hand your idea makes sense: if learners have different strengths, adding some features might help the meta-model detect when to give more importance to a particular learner. It's totally possible that overall the meta-model will work better this way in some cases.
  • On the other hand, generalizing this idea would often defeat the purpose of stacking:
    • if we know in advance that a particular model is particularly good in some specific identified cases, then it's likely optimal to switch entirely to this model in such cases (no stacking).
    • if we don't have any particular knowledge, then this would require to add many/all of the original features to the meta-model, just in case they help. But by adding features it's likely that (1) the model will try to use the features directly, hence collapsing into an individual learner itself, and (2) the model will be more complex, therefore more prone to overfitting and likely not to make the best use of the individual learners' output.

To sum up: the stacking approach relies on "making things simple" for the meta-model, so that it only has to make a call about the answers of the individual learners. This way the meta-model can "focus" solely on optimally use these answers. The more features we add to it, the more risk that it will not to its job correctly.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top