Question

Task: For all condition==FALSE, set groupmean to mean of all numbers by group. For all condition==TRUE set groupmean to mean of numbers only where condition==TRUE by group. I would like to have a solution which does not require copying the whole data.table but adds the desired column in place. I bet there's a plain simple solution, but I got lost a little...

My attempts so far:

set.seed(42)
require(data.table)

DT <- data.table(condition=sample(c(TRUE,FALSE), 50, replace=T),
                 group=rep(LETTERS[1:4], times=25),
                 numbers=1:100)

# modifies the right rows, but wrong value
DT[condition==FALSE, groupmean_1 := mean(numbers), by=group]

# right values, but not only rows where condition=FALSE
DT[, groupmean_2 := mean(numbers), by=group]

head(DT)
     condition group numbers groupmean_1 groupmean_2
1:     FALSE     A       1    42.66667          49
2:     FALSE     B       2    55.68421          50
3:      TRUE     C       3          NA          51
4:     FALSE     D       4    47.78947          52
5:     FALSE     A       5    42.66667          49
6:     FALSE     B       6    55.68421          50
Was it helpful?

Solution

You should reverse the sequence of how you define groupmean. Compute it as the group average for all rows, and substitute the rows where condition == TRUE afterwards.

DT[, groupmean:=mean(numbers), by=group]
DT[condition==TRUE, groupmean:=mean(numbers), by='group,condition']

I hope that helps

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top