Question

I'm doing a project. I have a classification problem that I should solve using gradient boosted decision trees. What I want to do is create a matrix that gives prediction of each decision tree for each sample. For example if I have 100 samples and 100 trees, I should have 100x100 matrix. i, j th entry gives the prediction of jth tree for ith sample.

I'm using sklearn and problem is I can't get prediction by each tree.

So far I tried:

newgb=gb.estimators_[0][0].fit(X_train, y_train)
print(newgb.score(X_train, y_train))

where gb is already a fitted model. What I understood from documentation of sklearn

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html#sklearn.ensemble.GradientBoostingRegressor.staged_predict

.estimators_

should return (number-of-trees x 1) matrix, each entry contains a tree that used by our model. By gb.estimators_[0][0] I tried to access to the first tree, and predict it with score. What I get as output is:

[0.12048193 0.95       0.95       0.95       0.95       0.95
 0.95       0.95       0.95       0.95       0.12048193 0.95
 0.95       0.95       0.12048193 0.12048193 0.12048193 0.12048193
...]

None of them are 1 or 0, like it should be(it is binary classification) and values repeat themselves like 0.95 and 0.12. I didn't use any likelihood function either so

.score()

supposed to give me only 1's and 0's.

I don't know how to get predictions for each individual tree. I don't know what I do wrong either.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top