Question

I have a data.table DT with a column named RF and many columns with an underline _in it. I want to loop through all those columns with an underline and subtract the RF column from it. However, I'm stuck. It seems that everything on the RHS of the := operator in a data.table does not work with dynamic variables.

Here is my DT and the desired output (hardcoded):

library(data.table)
DT <- data.table(RF  = 1:10,
                 S_1 = 11:20,
                 S_2 = 21:30)
#Desired output
DT[ , S_1 := S_1 - RF]
DT[ , S_2 := S_2 - RF]
DT
      RF S_1 S_2
 [1,]  1  10  20
 [2,]  2  10  20
 [3,]  3  10  20
...

However, I want this to be more flexible, i.e. loop through every column with "_" in its name and subtract RF:

#1. try: Does not work; Interestingly, the i on the LHS of := is interpreted as the column i, but on the RHS of
#:= it is interpreted as 2 and 3, respectively
for (i in grep("_", names(DT))){
  DT[ , i:= i - 1, with=FALSE]
}
DT
          RF  S_1 S_2
 [1,]  1   1   2
 [2,]  2   1   2
 [3,]  3   1   2
...

#2. try: Work with parse and eval
for (i in grep("_", names(DT), value=TRUE)){
  DT[ , eval(parse(text=i)):= eval(parse(text=i)) - RF]
}
#Error in eval(expr, envir, enclos) : object 'S_1' not found

Any hints how to do that would be great.

EDIT: As soon as I posted the question, I thought to myself: Why are you working with the := operator in the first place, and sure enough, I just realized I don't have to. This does work and doesn't need a loop:

DT[, grep("_", names(DT)), with=FALSE] - DT[, RF]

Sorry for that. However, I leave the question open because I'm still interested on why my approach with the := operator doesn't work. So maybe someone can help me there.

Was it helpful?

Solution

You were on the right track with your second attempt. Here is an approach that uses substitute to build the expression that gets passed in as the 'j' argument in DT[ , j ].

for (i in grep("_", names(DT), value=TRUE)){
    e <- substitute(X := X - RF, list(X = as.symbol(i)))
    DT[ , eval(e)]
}
DT
#     RF S_1 S_2
# [1,]  1  10  20
# [2,]  2  10  20
# [3,]  3  10  20
# [4,]  4  10  20
# [5,]  5  10  20

You could also use an LHS expression rather than a symbol :

for (i in grep("_", names(DT), value=TRUE))
    DT[, (i) := get(i)-RF]

OTHER TIPS

A workaround which I unfortunately discovered after I posted the question is as follows:

DT[, .SD, .SDcols = patterns('_')] - DT[, RF]

This also works in a more complicated setting in which there are additional columns you want to keep, but with some extra effort:

library(data.table)
DT <- data.table(RF  = 1:10,
                 S_1 = 11:20,
                 S_2 = 21:30,
                 addCol = rnorm(10)) #Column that should not be subtracted by RF, but still kept in DT

DT <- cbind(DT[, .SD, .SDcols = patterns("_")] - DT[, RF], addCol = DT[, addCol])

Updated to use set()+..; set is powerful (see edits for earlier attempt).

varnames <- grep("_", names(DT), value=TRUE)
set(DT, j = varnames, value = DT[, ..varnames] - DT[, RF])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top