Question

I am wondering why my stacked features do not help me to improve against my loss metric. Here's what I'm doing: I am adding new features which are simple the predictions originated from train, predict of other models to the original train/test features. Every time I have tried this method, it has failed. I am curious what the issue could be the with this. Can anyone give me some advice?

Was it helpful?

Solution

As far as I understood stacking does not add features to the original data set. The point is to train several models on the training data and use their predictions on training data as input features to another model.

First such kind of construction used logistic regression as a final ensemble and and class probabilities from each base learner as input features. Now, what I have described is a technical layout, the intuition behind is the following: considering that there are no models which are good over all joint probability space of features, one can combine their results in order to get the best from each one. In other words we can state the we explore the richness of models (seen as function spaces) to get a combined thing. This strategy does not work always but often it works.

I think you do something wrong. I think is better to use original features only for base learners. Be careful to use scores or probabilities if possible, instead of final classifications from base learners, it gives more space for improvements. Often is better to stack learners from different families, not the same model with different parameters (better to use a gradient boost and random forest than two gradient boosts). All of those advices are not rules which cannot be broken, and even if you take them all there is no guarantee that there will be improvements.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top