Question

I am using dplyr and I am wondering whether it is possible to compute differences between groups in one line. As in the small example below, the task is to compute the difference between groups A and Bs standardized "cent" variables.

library(dplyr)
# creating a small data.frame
GROUP <- rep(c("A","B"),each=10)
NUMBE <- rnorm(20,50,10)
datf <- data.frame(GROUP,NUMBE)

datf2 <- datf %.% group_by(GROUP) %.% mutate(cent = (NUMBE - mean(NUMBE))/sd(NUMBE))

gA <- datf2 %.% ungroup() %.% filter(GROUP == "A") %.% select(cent)
gB <- datf2 %.% ungroup() %.% filter(GROUP == "B") %.% select(cent)

gA - gB

This is of course no problem by creating different objects - but is there a more "built in" way of performing this task? Something more like this not working fantasy code below?

datf2 %.% summarize(filter(GROUP == "A",select(cent)) - filter(GROUP == "B",select(cent)))

Thank you!

Was it helpful?

Solution

Given we have 10 of each group, add an index 1:10, 1:10 and summarize over that with difference:

> datf2$entry=c(1:10,1:10)
> datf2 %.% ungroup() %.% group_by(entry) %.% summarize(d=cent[1]-cent[2])
Source: local data frame [10 x 2]

   entry          d
1      1 -0.8272879
2      2 -0.9159827
3      3 -0.5064762
4      4  0.4211639
5      5  1.3681720
6      6  3.3430289
7      7  1.0086822
8      8 -0.6163907
9      9 -0.7325220
10    10 -2.5423875

compare:

> gA - gB
         cent
1  -0.8272879
2  -0.9159827
3  -0.5064762
4   0.4211639
5   1.3681720
6   3.3430289
7   1.0086822
8  -0.6163907
9  -0.7325220
10 -2.5423875

Is there a way to inject the entry field into the data or the dplyr call? I'm not sure, it seems to rely on the functions knowing too much about the data...

OTHER TIPS

Thank you for the inspiration. I further developed this solution to that:

mutate(datf2,diffence = filter(datf2, GROUP == "A")$cent - filter(datf2, GROUP == "B")$cent)

This adds the result as column in the the data.frame.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top