
I hope this is not too naive of a question. I am performing a series of binomial regressions with different models in the caret package in R. All are working so far except for earth (MARS). Typically, the earth family is passed to the glm function through the earth function as glm=list(family=binomial). This seems to be working ok (as evident below). For the general predict() function, I would use the type="response' to properly scale the prediction. The examples below show the non-caret approach in fit1 with the correct prediction in pred1. pred1a is the improperly scaled prediction without type='response'. fit2 is the approach with caret and pred2 is the prediction; it is the same as the non-scaled prediction in pred1a. Digging through the fit2 object, the properly fitted values are present in the glm.list component. Therefore, the earth() function is behaving as it should.

The question is... since the caret prediction() function only takes type='prob' or 'raw', how can I instruct is to predict on the scale of the response?

Thank you very much.


fit1 <- earth(am ~ cyl + mpg + wt + disp, data = mtcars,
        degree=1, glm=list(family=binomial))
pred1 <- predict(fit1, newdata = mtcars, type="response")
[1] 0.0004665284 0.9979135993 # Correct - binomial with response

pred1a <- predict(fit1, newdata = mtcars)
[1] -7.669725  6.170226 # without "response"

fit2ctrl <- trainControl(method = "cv", number = 5)
fit2 <- train(am ~ cyl + mpg + wt + disp, data = mtcars, method = "earth", 
         trControl = fit2ctrl, tuneLength = 3,
pred2 <- predict(fit2, newdata = mtcars)
[1] -7.669725  6.170226 # same as pred1a

#within glm.list object in fit4
[1] 0.0004665284 0.9979135993
There are a few things:

  • the outcome (mtcars$am) is numeric 0/1 and train will treat this as a regression model
  • when the outcome is a factor, train will assume classification and will automatically add glm=list(family=binomial)
  • with classification and train, you will need to add classProbs = TRUE to trainControl for the model to produce class probabilities.

Here is an example with a different data set in the earth package:



a1 <- earth(survived ~ ., 
            data = etitanic,
            degree = 2,       
            nprune = 5)

etitanic$survived <- factor(ifelse(etitanic$survived == 1, "yes", "no"),
                            levels = c("yes", "no"))

a2 <- train(survived ~ ., 
            data = etitanic, 
            method = "earth",
            tuneGrid = data.frame(degree = 2, nprune = 5),
            trControl = trainControl(method = "none", 
                                     classProbs = TRUE))


> predict(a1, head(etitanic), type = "response")
[1,] 0.8846552
[2,] 0.9281010
[3,] 0.8846552
[4,] 0.4135716
[5,] 0.8846552
[6,] 0.4135716
> predict(a2, head(etitanic), type = "prob")
        yes         no
1 0.8846552 0.11534481
2 0.9281010 0.07189895
3 0.8846552 0.11534481
4 0.4135716 0.58642840
5 0.8846552 0.11534481
6 0.4135716 0.58642840


