Question

I'm currently using R to do feature selection through the use of Random Forest regression. I want to split my data 70:30, which is easy enough to do. However, I want to be able to do this 10 times, with each 10 times obtaining a different set of examples from the one before.

> trainIndex<- createDataPartition(lipids$RT..seconds., p=0.7, list=F)
> lipids.train <- lipids[trainIndex, ]
> lipids.test <- lipids[-trainIndex, ]

This is what I'm doing at the moment, and it works great for splitting my data 70:30. But when I do it again , I get the same 70% of the data in my training set, and the same 30% of the data in my test data. I know this is how createDataPartition works, but is there way of making it so that I get a different 70% of the data the next time I perform it?

Thanks

Was it helpful?

Solution

In the future, please include the packages you're using since createDataPartition is not in base R. I'm assuming you're using the caret package. If that is correct, did you find the times argument?

trainIndex<- createDataPartition(lipids$RT..seconds., p=0.7, list=F, times=10)

As mentioned in the comment, you can just as simply use sample:

sample(seq_along(lipids$RD..seconds), as.integer(0.7 * nrow(lipids)))

And sample will choose a different random seed each time it is run, so you will get different orders.

OTHER TIPS

library(dplyr)
n <- as.integer(length(data[,1])*0.7)
data_70 <- data[sample(nrow(data),n), ]
data_30 <- anti_join(data, data_70)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top