R unexpected NA output from RandomForest
-
04-06-2021 - |
문제
I'm working with a data set that has a lot of NA's. I know that the first 6 columns do NOT have any NA's. Since the first column is an ID column I'm omitting it.
I run the following code to select only lines that have values in the response column:
sub1 <- TrainingData[which(!is.na(TrainingData[,70])),]
I then use sub1 as the data set in a randomForest using this code:
set.seed(448)
RF <- randomForest(sub1[,c(2:6)], sub1[,70]
,do.trace=TRUE,importance=TRUE,ntree=10,,forest=TRUE)
then I run this code to check the output for NA's:
> length(which(is.na(RF$predicted)))
[1] 65
I can't figure out why I'd be getting NA's if the data going in is clean.
Any suggestions?
해결책
I think you should use more trees. Because predicted
values are preditions for the out-of-bag set. And if number of trees very small some cases are never present in out-of-bag set, because this set forms randomly.
제휴하지 않습니다 StackOverflow