Question

I am having a hard time understanding how to format and utilize the output from predict.gbm ('gbm' package) with the multiclass.roc function ('pROC' packagage).

I used a multinomial gbm to predict a validation dataset, the output of which appears to be probabilities of each datapoint of belonging to each factor level. (Correct me if I am wrong)

preds2 <- predict.gbm(density.tc5.lr005, ProxFiltered, n.trees=best.iter, type="response")

> head(as.data.frame(preds2))
      1.2534     2.2534     3.2534      4.2534       5.2534
1 0.62977743 0.25756095 0.09044278 0.021497259 7.215793e-04
2 0.16992912 0.24545691 0.45540153 0.094520208 3.469224e-02
3 0.02633356 0.06540245 0.89897614 0.009223098 6.474949e-05

The factor levels are 1-5, not sure why the decimal addition

I am trying to compute the multi-class AUC as defined by Hand and Till (2001) using multiclass.roc but I'm not sure how to supply the predicted values in the single vector it requires.

I can try to work up an example if necessary, though I assume this is routine for some and I am missing something as a novice with the procedure.

Was it helpful?

Solution

Pass in the response variable as-is, and use the most likely candidate for the predictor:

multiclass.roc(ProxFiltered$response_variable, apply(preds2, 1, function(row) which.max(row)))

OTHER TIPS

An alternative is to define a custom scoring function - for instance the ratio between the probabilities of two classes and to do the averaging yourself:

names(preds2) <- 1:5
aucs <- combn(1:5, 2, function(X) {
    auc(roc(ProxFiltered$response_variable, preds2[[X[1]]] / preds2[[X[2]]], levels = X))
})
mean(aucs)

Yet another (better) option is to convert your question to a non-binary one, i.e is the best prediction (or some weighted-best prediction) correlated with the true class?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top