質問

Example: a data frame with many individuals and 3 variables: year(integer), gender(factor: male/female) and union(factor: yes/no). I would like to calculate the probability of being a union member given year and gender. I usually do this with aggregate(). Is I am doing this all the time, I'm looking for a short and fast way to do it in dplyr.

Kind regards, Peter

役に立ちましたか?

解決

Here is the dplyr equivalent to @droopy's answer:

tbl_df( x ) %.%
  group_by( year, gender ) %.%
  summarise( P = mean(union == "yes") )

Source: local data frame [8 x 3]
Groups: year

  year gender   P
1 2001 female 1.0
2 2001   male 0.5
3 2002 female 0.5
4 2002   male 0.0
5 2003 female 0.0
6 2003   male 0.5
7 2004 female 0.5
8 2004   male 0.0

... and for completeness also the data.table solution:

as.data.table(x)[ , list( P = mean( union == "yes" ) ), by = list( year, gender )  ]

   year gender   P
1: 2001   male 0.5
2: 2001 female 1.0
3: 2002   male 0.0
4: 2002 female 0.5
5: 2003   male 0.5
6: 2003 female 0.0
7: 2004   male 0.0
8: 2004 female 0.5

他のヒント

something like that?

x <- data.frame(year=rep(2001:2004, each=4), gender=rep(c("male", "female"), 8), union=sample(c("yes", "no"), 16, rep=T))
ddply(x, .(year, gender), summarize, P=mean(union=="yes"))
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top