Question

How can I use R to partition a dataset into N equally sized partitions? I've tried something like

    for (i in 1:100){data[i] <- full_data[i:(100000*i),]}

Which obviously doesn't work, but hopefully gives an idea of what I'm trying to accomplish. The full dataset has 1,000,000 rows and is already in random order. I'd like 100 equal and independent datasets of 10,000 rows each.

Was it helpful?

Solution

that should do it, assuming data is a list:

data <- list()
for (i in 1:100){data[[i]] <- full_data[((i-1)*10000+1):(i*10000),]}

OTHER TIPS

You can create quantiles-groups of index (eg you want exactly n group without having to count)

data <- data.frame(1:1000000)

xtile <- function (x, n)
    {
        cuts <- quantile(x, probs = seq(0, 1, length = n + 1))
        cut(x, breaks = cuts, include.lowest = TRUE)
    }

group <- xtile(1:nrow(data), 100)
all(table(group)== 10000)

data.spl <- split(data, group)
data.spl[[2]]

I believe the cut2() function will also partition equally, and that you can set the number of partitions with an argument.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top