Can Boosted Trees predict below the minimum value of the training label?

https://datascience.stackexchange.com/questions/77234

12-12-2020
|

문제

I am using gradient Gradient Boosted Trees (with Catboost) for a Regression task. Can GBtrees predict a label that is below the minimum (or above the max) that was seen in the training ? For instance if the minimum value the label had is 10, would GBtrees be able to predict 5 ?

Thanks for our help !

해결책

Yes, gradient boosted trees can make predictions outside the training labels' range. Here's a quick example:

from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingRegressor

X, y = make_classification(random_state=42)

gbm = GradientBoostingRegressor(max_depth=1,
                                n_estimators=10,
                                learning_rate=1,
                                random_state=42)
gbm.fit(X,y)
preds = gbm.predict(X)
print(preds.min(), preds.max())

outputs -0.010418732339562916 1.134566081403055 (and make_classification gives outputs just 0 and 1).

Now, this is unrealistic for a number of reasons: I'm using a regression model for a classification problem, I'm using learning rate 1, depth only 1, no regularization, etc. All of these could be made more proper and we could still find an example with predictions outside the training range, but it would be harder to construct such an example. I would say that in practice, you're unlikely to get anything very far from the training range.

See the (more theoretical) example in this comment of an xgboost github issue, found via this cv.se post.

To be clear, decision trees, random forests, and adaptive boosting all cannot make predictions outside the training range. This is specific to gradient boosted trees.

다른 팁

Prediction of a Decision tree will lie within the limits of the target because at the end either the record will fall to a specific target leaf if the depth is not controlled Or it will be average on multiple targets. With the second approach too, it can't cross the limit of the target.

Coming to Ensembling -

Bagging -
Bagging simply averages multiple trees. So again prediction will remain in the target's limit

Adaptive boosting
Here we add weight to records on successive Tree.
This will not impact the prediction of an individual tree. Here, we do a weighted average of all tree. Again, the prediction will remain in the target's limit

Gradient Boosting
Here we add new tree based on the prediction error of the previous three.
In a very simple language, Let's assume 100 is a target. The first tree predicts 70. Then the second tree will be trained on this 30. Let's assume it predicted 20. With this approach, we grow many trees. Then, we will have these predictions -
70 + 20 + 6 + 2 + 1 + 0.5 + 0.2 + ......
It will not cross 100.

Edit post Ben's comment-

Above logic(for GB) will not work if your learning rate is too high as that will make the residual value grow with every next tree and can reach any value.
Gradientboost uses Gradient Descent on the Function itself. So, the target for the next tree depends on the residual and the Learning rate. With too many trees, the value will blow up.

See this code snippet with LR=2.1 and Tree=100, 398 can become 1.5 Mn

from sklearn.datasets import make_regression
from sklearn.ensemble import GradientBoostingRegressor

X, y = make_regression()

model = GradientBoostingRegressor(max_depth=1, n_estimators=100, learning_rate=2.1, random_state=42)

model.fit(X,y)
preds = model.predict(X)
print(preds.min(),y.min(), preds.max(),y.max())

-1246776.29 || -487.87 || 1586302.24 || 398.12

if n_estimators=10, then it is not blown yet. Need more Trees to multiply

-277.83 || -393.27 || 118.32 || 594.82

Hence, the answer to your question is No Yes (Theoretically as we mostly keep LR<1.0 for a smooth learning)

In Catboost(gradient boosting) I dont know, but in decision trees and random forest the answer is no.

The final prediction is done based in the "mean" of the instances that fell in the leave. I say "mean" but its not necessary the mean. For random forest is the mean of that mean.

Now your question, can I have a predicted value bigger than the maximum value in train? In decision trees - No, In random forest - No,For Gradient boosting - I don't know, For Linear Models - Yes

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange