Question

I have attempted to email the author of this package without success, just wondering if anybody else has experienced this.

I am having an using rpart on 4000 rows of data with 13 attributes. I can run the same test on 300 rows of the same data with no issue. When I run on 4000 rows, Rgui.exe runs consistently at 50% CPU and the UI hangs; it will stay like this for at least 4-5hours if I let it run, and never exit or become responsive.

here is the code I am using both on the 300 and 4000 size subset:

train <- read.csv("input.csv", header=T)
y <- train[, 18]
x <- train[, 3:17]
library(rpart)
fit <- rpart(y ~ ., x)

Is this a known limitation of rpart, am I doing something wrong? potential workarounds?

Was it helpful?

Solution

the problem here was data prep error.

a header was re-written far down in the middle of the data set.

OTHER TIPS

Can you reproduce the error message when you feed rpart random data of similar dimensions, rather than your real data (from input.csv)? If not, it's probably a problem with your data (formatting perhaps?). After importing your data using read.csv, check the data for format issues by looking at the output from str(train).

#How to do an equivalent rpart fit one some random data of equivalent dimension
dats<-data.frame(matrix(rnorm(4000*14), nrow=4000))

y<-dats[,1]
x<-dats[,-1]
library(rpart)
system.time(fit<-rpart(y~.,x))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top