Question

I'm looking for R packages or machine learning models/algos like randomForest, glmnet, gbdt, etc that can handle NA's, as opposed to ignoring the row or column that has any instances of NA's. I'm not looking to impute. Any suggestions?

Was it helpful?

Solution

The CART algorithm handles NA's rather seamlessly (rpart package). Then you can always turn to bagged trees using rpart, probably via the ipred package.

I've heard that multivariate adaptive regression splines (mars in the mda package) handle missing data well, although I don't have much experience with it.

Also, k nearest neighbor models (and kernel methods more generally, I think) can be altered to deal with missing values in a fairly straightforward manner, but implementations may not do that out of the box. But presumably it would be as simple as adjusting the distance metric to only consider pairwise complete cases. I'm less familiar with specific R packages that do more than the vanilla knn models.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top