Вопрос

How can I use R to partition a dataset into N equally sized partitions? I've tried something like

    for (i in 1:100){data[i] <- full_data[i:(100000*i),]}

Which obviously doesn't work, but hopefully gives an idea of what I'm trying to accomplish. The full dataset has 1,000,000 rows and is already in random order. I'd like 100 equal and independent datasets of 10,000 rows each.

Это было полезно?

Решение

that should do it, assuming data is a list:

data <- list()
for (i in 1:100){data[[i]] <- full_data[((i-1)*10000+1):(i*10000),]}

Другие советы

You can create quantiles-groups of index (eg you want exactly n group without having to count)

data <- data.frame(1:1000000)

xtile <- function (x, n)
    {
        cuts <- quantile(x, probs = seq(0, 1, length = n + 1))
        cut(x, breaks = cuts, include.lowest = TRUE)
    }

group <- xtile(1:nrow(data), 100)
all(table(group)== 10000)

data.spl <- split(data, group)
data.spl[[2]]

I believe the cut2() function will also partition equally, and that you can set the number of partitions with an argument.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top