R data.table: adding new column for subset of rows conditional on all rows

https://stackoverflow.com/questions/23513901

r
data.table

16-07-2023
|

سؤال

Task: For all condition==FALSE, set groupmean to mean of all numbers by group. For all condition==TRUE set groupmean to mean of numbers only where condition==TRUE by group. I would like to have a solution which does not require copying the whole data.table but adds the desired column in place. I bet there's a plain simple solution, but I got lost a little...

My attempts so far:

set.seed(42)
require(data.table)

DT <- data.table(condition=sample(c(TRUE,FALSE), 50, replace=T),
                 group=rep(LETTERS[1:4], times=25),
                 numbers=1:100)

# modifies the right rows, but wrong value
DT[condition==FALSE, groupmean_1 := mean(numbers), by=group]

# right values, but not only rows where condition=FALSE
DT[, groupmean_2 := mean(numbers), by=group]

head(DT)
     condition group numbers groupmean_1 groupmean_2
1:     FALSE     A       1    42.66667          49
2:     FALSE     B       2    55.68421          50
3:      TRUE     C       3          NA          51
4:     FALSE     D       4    47.78947          52
5:     FALSE     A       5    42.66667          49
6:     FALSE     B       6    55.68421          50

المحلول

You should reverse the sequence of how you define groupmean. Compute it as the group average for all rows, and substitute the rows where condition == TRUE afterwards.

DT[, groupmean:=mean(numbers), by=group]
DT[condition==TRUE, groupmean:=mean(numbers), by='group,condition']

I hope that helps

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow