Question

I'm doing classification in R. I have a dataframe of test data called testD and a dataframe of data called results (these are the correct classification values).

These dataframes are pretty big and it takes far too long to train my model on the entire set, so I'd like to randomly select roughly 5,000 data points and train on that.

I know how to randomly select 5,000 rows from either data frame, but I need to know exactly which rows I selected from 'testD' so I know which 5,000 rows in 'results' to compare against.

Any help would be greatly appreciated!

Was it helpful?

Solution

You can do this with:

indices = sample(NROW(testD), 5000)
testD.sample = testD[indices, ]
results.sample = results[indices, ]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top