How to subtract groups within a data frame?

https://stackoverflow.com/questions/23668510

r
dataframe

23-07-2023
|

Question

Take this data frame...

df <- data.frame(cat = rep(c('cat1','cat2','cat3'),each=3),
                 subcat = rep(c('a','b','c'),3),
                 y1 = c(rep(10,3),rep(1,6)),
                 y2 = c(rep(10,3),1:6))

df:

cat subcat y1 y2
cat1      a 10 10
cat1      b 10 10
cat1      c 10 10
cat2      a  1  1
cat2      b  1  2
cat2      c  1  3
cat3      a  1  4
cat3      b  1  5
cat3      c  1  6

I'm looking to subtract cat2 and cat3 from cat1... and calling the resulting cat something like new.cat1. The result I'm looking for should be a data frame that looks like this (or it could simply be appended to df.)

     cat subcat y1 y2
new.cat1      a  8  5
new.cat1      b  8  3
new.cat1      c  8  1

In this example, I have only one sub-category but I'm looking for a method which could have potentially several sub-categories. Any help?

Solution

You can try aggregate using formula as below.

df
##    cat subcat y1 y2
## 1 cat1      a 10 10
## 2 cat1      b 10 10
## 3 cat1      c 10 10
## 4 cat2      a  1  1
## 5 cat2      b  1  2
## 6 cat2      c  1  3
## 7 cat3      a  1  4
## 8 cat3      b  1  5
## 9 cat3      c  1  6

res <- aggregate(formula = cbind(y1, y2) * ifelse(cat == "cat1", 1, -1) ~ subcat, data = df, 
    FUN = sum)
cbind(cat = "new.cat1", res)
##        cat subcat y1 y2
## 1 new.cat1      a  8  5
## 2 new.cat1      b  8  3
## 3 new.cat1      c  8  1

OTHER TIPS

You could use plyr::ddply. Not sure how you want it appended to df though.

> library(plyr)
> ddp <- ddply(df, .(subcat), summarize, 
        y1 = sum(y1[1], -y1[2:3]), y2 = sum(y2[1], -y2[2:3]))
> cbind(cat = 'new.cat1', ddp)
#        cat subcat y1 y2
# 1 new.cat1      a  8  5
# 2 new.cat1      b  8  3
# 3 new.cat1      c  8  1

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow