Question

I built a decision tree from training data using the rpart package in R. Now i have more data and I want to check it against the tree to check the model. Logically/iteratively, I want to do the following:

for each datapoint in new data
     run point thru decision tree, branching as appropriate
     examine how tree classifies the data point
     determine if the datapoint is a true positive or false positive

How do I do that in R?

Was it helpful?

Solution

To be able to use this, I assume you split up your training set into a subset training set and a test set.

To create the training model you can use:

model <- rpart(y~., traindata, minbucket=5)   # I suspect you did it so far.

To apply it to the test set:

pred <- predict(model, testdata) 

You then get a vector of predicted results.

In your training test data set you also have the "real" answer. Let's say the last column in the training set.

Simply equating them will yield the result:

pred == testdata[ , last]  # where 'last' equals the index of 'y'

When the elements are equal, you will get a TRUE, when you get a FALSE it means your prediction was wrong.

pred + testdata[, last] > 1 # gives TRUE positive, as it means both vectors are 1
pred == testdata[, last]    # gives those that are correct

It might be interesting to see how much percent you have correct:

mean(pred == testdata[ , last])    # here TRUE will count as a 1, and FALSE as 0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top