R data.table: adding new column for subset of rows conditional on all rows

StackOverflow https://stackoverflow.com/questions/23513901

  •  16-07-2023
  •  | 
  •  

سؤال

Task: For all condition==FALSE, set groupmean to mean of all numbers by group. For all condition==TRUE set groupmean to mean of numbers only where condition==TRUE by group. I would like to have a solution which does not require copying the whole data.table but adds the desired column in place. I bet there's a plain simple solution, but I got lost a little...

My attempts so far:

set.seed(42)
require(data.table)

DT <- data.table(condition=sample(c(TRUE,FALSE), 50, replace=T),
                 group=rep(LETTERS[1:4], times=25),
                 numbers=1:100)

# modifies the right rows, but wrong value
DT[condition==FALSE, groupmean_1 := mean(numbers), by=group]

# right values, but not only rows where condition=FALSE
DT[, groupmean_2 := mean(numbers), by=group]

head(DT)
     condition group numbers groupmean_1 groupmean_2
1:     FALSE     A       1    42.66667          49
2:     FALSE     B       2    55.68421          50
3:      TRUE     C       3          NA          51
4:     FALSE     D       4    47.78947          52
5:     FALSE     A       5    42.66667          49
6:     FALSE     B       6    55.68421          50
هل كانت مفيدة؟

المحلول

You should reverse the sequence of how you define groupmean. Compute it as the group average for all rows, and substitute the rows where condition == TRUE afterwards.

DT[, groupmean:=mean(numbers), by=group]
DT[condition==TRUE, groupmean:=mean(numbers), by='group,condition']

I hope that helps

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top