I have a large dataset with columns IDNum, Var1, Var2, Var3, Var4, Var5, Var6. The variables are boolean with value either 0 or 1. Each row could be one of 64 different possible permutations. I would like to count the number of rows corresponding to each permutation present. Is there an efficient way to write this in R?

有帮助吗?

解决方案

aggregate can do this. Here's a shorter example:

r <- function() rbinom(10, 1, .5)
d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
d
   IDNum Var1 Var2
1      1    0    1
2      2    0    1
3      3    0    0
4      4    1    0
5      5    1    1
6      6    0    0
7      7    1    1
8      8    1    0
9      9    0    1
10    10    0    1

Now to count the number of each combination:

> aggregate(d$IDNum, d[-1], FUN=length)
  Var1 Var2 x
1    0    0 2
2    1    0 2
3    0    1 4
4    1    1 2

The values in d$IDNum aren't actually used here, but something must be passed to the length function. The values in d$IDNum for each combination are passed to length to get the count.

其他提示

This will give a slightly different result and will list out all the possibilities regardless of whether they are present or not. Example data:

nam <- c("IDNum",paste0("Var",1:6))
n <- 5
set.seed(23)
dat <- setNames(data.frame(1:n,replicate(6,sample(0:1,n,replace=TRUE))),nam)


#  IDNum Var1 Var2 Var3 Var4 Var5 Var6
#1     1    1    0    1    0    1    1
#2     2    0    1    1    1    0    1
#3     3    0    1    0    1    0    1
#4     4    1    1    0    1    1    0
#5     5    1    1    1    1    0    1

Count em up:

data.frame(table(dat[-1]))

#   Var1 Var2 Var3 Var4 Var5 Var6 Freq
#1     0    0    0    0    0    0    0
#...
#28    1    1    0    1    1    0    1
#...
#43    0    1    0    1    0    1    1
#...
#47    0    1    1    1    0    1    1
#48    1    1    1    1    0    1    1
#...
#54    1    0    1    0    1    1    1
#...
#64    1    1    1    1    1    1    0

You can as well using the count function in dplyr:

library(dplyr)

r <- function() rbinom(10, 1, .5)
d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
d

d %>% count(Var1, Var2)

Output:

Var1 Var2 n
1    0    0 3
2    0    1 3
3    1    0 1
4    1    1 3
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top