Question

I'm transforming an array to a data frame and I want to use random forest in that data frame. The problem is that I'm getting to much output from the predict.

This is a similar example I created to reproduce the problem:

matTest <- array(1:5120, dim=c(10,512))
dataTest <- data.frame(matTest)
dataTest$y <- 1:10
TEST.rf <- randomForest(y ~ ., dataTest)
predict(TEST.rf, data=dataTest[1,])

the output from predict is

       1        2        3        4        5        6        7        8        9       10 
3.308430 2.778164 2.749053 3.093386 4.027957 5.143252 6.873542 7.707022 7.902198 7.621082 

but I should be getting only a numeric value from the predict, since every line should be an individual sample.

I don't know what I'm doing wrong...

Was it helpful?

Solution

You should check ?predict.randomForest to make sure that you know the names of the arguments of the function you intend to use.

You should be using newdata = ... instead.

Since data doesn't match any of the named arguments, it is passed on to ... and then ignored, which means that you get back the default: the out-of-bag predictions for the original data set.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top