Вопрос

I have a large dataset with columns IDNum, Var1, Var2, Var3, Var4, Var5, Var6. The variables are boolean with value either 0 or 1. Each row could be one of 64 different possible permutations. I would like to count the number of rows corresponding to each permutation present. Is there an efficient way to write this in R?

Это было полезно?

Решение

aggregate can do this. Here's a shorter example:

r <- function() rbinom(10, 1, .5)
d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
d
   IDNum Var1 Var2
1      1    0    1
2      2    0    1
3      3    0    0
4      4    1    0
5      5    1    1
6      6    0    0
7      7    1    1
8      8    1    0
9      9    0    1
10    10    0    1

Now to count the number of each combination:

> aggregate(d$IDNum, d[-1], FUN=length)
  Var1 Var2 x
1    0    0 2
2    1    0 2
3    0    1 4
4    1    1 2

The values in d$IDNum aren't actually used here, but something must be passed to the length function. The values in d$IDNum for each combination are passed to length to get the count.

Другие советы

This will give a slightly different result and will list out all the possibilities regardless of whether they are present or not. Example data:

nam <- c("IDNum",paste0("Var",1:6))
n <- 5
set.seed(23)
dat <- setNames(data.frame(1:n,replicate(6,sample(0:1,n,replace=TRUE))),nam)


#  IDNum Var1 Var2 Var3 Var4 Var5 Var6
#1     1    1    0    1    0    1    1
#2     2    0    1    1    1    0    1
#3     3    0    1    0    1    0    1
#4     4    1    1    0    1    1    0
#5     5    1    1    1    1    0    1

Count em up:

data.frame(table(dat[-1]))

#   Var1 Var2 Var3 Var4 Var5 Var6 Freq
#1     0    0    0    0    0    0    0
#...
#28    1    1    0    1    1    0    1
#...
#43    0    1    0    1    0    1    1
#...
#47    0    1    1    1    0    1    1
#48    1    1    1    1    0    1    1
#...
#54    1    0    1    0    1    1    1
#...
#64    1    1    1    1    1    1    0

You can as well using the count function in dplyr:

library(dplyr)

r <- function() rbinom(10, 1, .5)
d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
d

d %>% count(Var1, Var2)

Output:

Var1 Var2 n
1    0    0 3
2    0    1 3
3    1    0 1
4    1    1 3
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top