R unexpected NA output from RandomForest

https://stackoverflow.com/questions/10367579

04-06-2021
|

문제

I'm working with a data set that has a lot of NA's. I know that the first 6 columns do NOT have any NA's. Since the first column is an ID column I'm omitting it.

I run the following code to select only lines that have values in the response column:

sub1 <- TrainingData[which(!is.na(TrainingData[,70])),]

I then use sub1 as the data set in a randomForest using this code:

set.seed(448)
RF <- randomForest(sub1[,c(2:6)], sub1[,70]
    ,do.trace=TRUE,importance=TRUE,ntree=10,,forest=TRUE)

then I run this code to check the output for NA's:

> length(which(is.na(RF$predicted)))
[1] 65

I can't figure out why I'd be getting NA's if the data going in is clean.

Any suggestions?

해결책

I think you should use more trees. Because predicted values are preditions for the out-of-bag set. And if number of trees very small some cases are never present in out-of-bag set, because this set forms randomly.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow