Question

RPART uses a different splitting procedure for continuous, ordinal and categorical variables. Is there a way to "inform" RPART about the variable type? For illustration, I have an ordinal variable with integer values (1,..,5). Right now, I need to coerce it into characters so RPART will not split it like a continuous variable.

I would like to refrain from changing all my variable types, just for RPART. I would prefer to declare it, somehow.

Thanks.

Was it helpful?

Solution

The problem is that how is R to know that foo <- c(1,2,3,2,4,5,1,5) (for example) is not numeric variable? If you look at the class of foo you'll see it is numeric.

R> class(foo)
[1] "numeric"

The problem you have is that at a very basic level you didn't tell R what the data types were. The simple solution in this case is not to convert this to a character vector, but to convert it to an ordered factor. If only for the fact that this is what the data is! rpart should pick up the factor aspect and treat it accordingly.

Therefore, the way to inform rpart that the variable is ordinal is to tell R that it is ordinal

foo <- as.ordered(foo)

R> foo
[1] 1 2 3 2 4 5 1 5
Levels: 1 < 2 < 3 < 4 < 5

I suspect you are missing out on other features of R because you fail to tell it the nature of the data. R is making an assumption about it which is not correct.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top