A possible solution involves storing the sampled indices in a separate named vector.
train_idx <- sample(1:nrow(mydata),1000,replace=FALSE)
train <- mydata[train_idx,] # select all these rows
test <- mydata[-train_idx,] # select all but these rows
Also, knowing that a data.frame
's row.names
attribute must consist of unique values,
you may also set e.g.
test <- mydata[!(row.names(mydata) %in% row.names(train)), ]
But the second solution is 2x slower on mydata <- data.frame(a=1:100000, b=rep(letters, len=100000))
, as measured by microbenchmark()
.