Question

I was trying to build a 0-1 classifier using xgboost R package. My question is how predictions are made? For example in random forests, trees "vote" against each option and the final prediction is based on majority. As regard xgboost, the regression case is simple since prediction on whole model is equal to sum of predcitions for weak learners (boosted trees), but what about classification?

Does xgboost classifier works the same as in the random forest (I don't think so, since it can return predictive probabilities, not class membership).

Was it helpful?

Solution

The gradient boost algorithm create a set of decision tree.

The prediction process used here use these steps:

  • for each tree, create a temporary "predicted variable", applying the tree to the new data set.
  • use a formula to aggregate all these tree. Depending on the model:
    • bernoulli: 1/(1 + exp(-(intercept + SUM(temporary pred))))
    • poisson, gamma: exp(intercept + SUM(temporary pred))
    • adaboost: 1 /(1 + exp(-2*(intercept + SUM(temporary pred))))

The temporary "predicted variable" is a probability, having no sense by its own.

The more tree you have, the more smooth is your prediction.( as for each tree, only a finite set of value is spread across your observations)

The R process is probably optimised, but it is enough to understand the concept.

In the h2o implementation of the gradient boost, the output is a flag 0/1. I think the F1 score is used by default to convert probability into flag. I'll do some search/test to confirm that.

In that same implementation, one of the default output for a binary outcome is a confusion matrix, which is a great way to assess your model ( and open a whole new bunch of interrogations).

The intercept is "the initial predicted value to which trees make adjustments". Basically,just an initial adjustment.

In addition: h2o.gbm documentation

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top