How can I use R to partition a dataset into N equally sized partitions? I've tried something like

    for (i in 1:100){data[i] <- full_data[i:(100000*i),]}

Which obviously doesn't work, but hopefully gives an idea of what I'm trying to accomplish. The full dataset has 1,000,000 rows and is already in random order. I'd like 100 equal and independent datasets of 10,000 rows each.

有帮助吗?

解决方案

that should do it, assuming data is a list:

data <- list()
for (i in 1:100){data[[i]] <- full_data[((i-1)*10000+1):(i*10000),]}

其他提示

You can create quantiles-groups of index (eg you want exactly n group without having to count)

data <- data.frame(1:1000000)

xtile <- function (x, n)
    {
        cuts <- quantile(x, probs = seq(0, 1, length = n + 1))
        cut(x, breaks = cuts, include.lowest = TRUE)
    }

group <- xtile(1:nrow(data), 100)
all(table(group)== 10000)

data.spl <- split(data, group)
data.spl[[2]]

I believe the cut2() function will also partition equally, and that you can set the number of partitions with an argument.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top