Question

How can i get know which variables are actually used in a constructed tree?

model = tree(status~., set.train)

I can see the variables if i write:

summary(model)

tree(formula = status ~ ., data = set.train)
Variables actually used in tree construction:
[1] "spread1"      "MDVP.Fhi.Hz." "DFA"          "D2"           "RPDE"                "MDVP.Shimmer" "Shimmer.APQ5"
Number of terminal nodes:  8 
Residual mean deviance:  0.04225 = 5.831 / 138 
Distribution of residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.9167  0.0000  0.0000  0.0000  0.0000  0.6667 

BUT how can i get in a vector, the indices of which variables are actually used in?

Was it helpful?

Solution

You can look at the structure of an object using the str() function. While looking in there you should see a few different places to extract the variables used to make your tree model, here is one example:

> library(tree)
> 
> fit <- tree(Species ~., data=iris)
> attr(fit$terms,"term.labels")
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"

EDIT: And since you specifically asked for the indices, you can just match() those back the variable names in your dataset (although they may always be in order - I haven't used the tree package before so I can't say).

> match(attr(fit$terms,"term.labels"),names(iris))
[1] 1 2 3 4
> names(iris)[match(attr(fit$terms,"term.labels"),names(iris))]
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 

EDIT2:

You're right! Try this:

> summary(fit)$used
[1] Petal.Length Petal.Width  Sepal.Length
Levels: <leaf> Sepal.Length Sepal.Width Petal.Length Petal.Width

OTHER TIPS

I think this is what you're looking for

fit <- rpart(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris) used.var <- setdiff(levels(fit$frame$var), "<leaf>")

Quite a while since, and using package rpart instead of tree. I think Brian Ripley's solution used in rpart coded in rpart::printcp() could still be of interest. It goes like this:

library(rpart)
r.rp <- rpart(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,   data=iris)
r.rp

# extract from rpart::printcp()
frame <- r.rp$frame
leaves <- frame$var == "<leaf>"
used <- unique(frame$var[!leaves])
if (!is.null(used)) {
   cat("Variables actually used in tree construction:\n")
   print(sort(as.character(used)), quote = FALSE)
cat("\n")
}

If you are willing to switch to similar package rpart you can get used variables ordered by importance directly from fit

fit <- rpart(Species ~., data=iris)
fit$variable.importance
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top