문제

I have a large dataset with columns IDNum, Var1, Var2, Var3, Var4, Var5, Var6. The variables are boolean with value either 0 or 1. Each row could be one of 64 different possible permutations. I would like to count the number of rows corresponding to each permutation present. Is there an efficient way to write this in R?

도움이 되었습니까?

해결책

aggregate can do this. Here's a shorter example:

r <- function() rbinom(10, 1, .5)
d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
d
   IDNum Var1 Var2
1      1    0    1
2      2    0    1
3      3    0    0
4      4    1    0
5      5    1    1
6      6    0    0
7      7    1    1
8      8    1    0
9      9    0    1
10    10    0    1

Now to count the number of each combination:

> aggregate(d$IDNum, d[-1], FUN=length)
  Var1 Var2 x
1    0    0 2
2    1    0 2
3    0    1 4
4    1    1 2

The values in d$IDNum aren't actually used here, but something must be passed to the length function. The values in d$IDNum for each combination are passed to length to get the count.

다른 팁

This will give a slightly different result and will list out all the possibilities regardless of whether they are present or not. Example data:

nam <- c("IDNum",paste0("Var",1:6))
n <- 5
set.seed(23)
dat <- setNames(data.frame(1:n,replicate(6,sample(0:1,n,replace=TRUE))),nam)


#  IDNum Var1 Var2 Var3 Var4 Var5 Var6
#1     1    1    0    1    0    1    1
#2     2    0    1    1    1    0    1
#3     3    0    1    0    1    0    1
#4     4    1    1    0    1    1    0
#5     5    1    1    1    1    0    1

Count em up:

data.frame(table(dat[-1]))

#   Var1 Var2 Var3 Var4 Var5 Var6 Freq
#1     0    0    0    0    0    0    0
#...
#28    1    1    0    1    1    0    1
#...
#43    0    1    0    1    0    1    1
#...
#47    0    1    1    1    0    1    1
#48    1    1    1    1    0    1    1
#...
#54    1    0    1    0    1    1    1
#...
#64    1    1    1    1    1    1    0

You can as well using the count function in dplyr:

library(dplyr)

r <- function() rbinom(10, 1, .5)
d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
d

d %>% count(Var1, Var2)

Output:

Var1 Var2 n
1    0    0 3
2    0    1 3
3    1    0 1
4    1    1 3
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top