Loop through columns in a data.table and transform those columns
-
27-10-2019 - |
Question
I have a data.table DT
with a column named RF
and many columns with an underline _
in it. I want to loop through all those columns with an
underline and subtract the RF
column from it. However, I'm stuck. It seems that everything on the RHS of the
:=
operator in a data.table
does not work with dynamic variables.
Here is my DT
and the desired output (hardcoded):
library(data.table)
DT <- data.table(RF = 1:10,
S_1 = 11:20,
S_2 = 21:30)
#Desired output
DT[ , S_1 := S_1 - RF]
DT[ , S_2 := S_2 - RF]
DT
RF S_1 S_2
[1,] 1 10 20
[2,] 2 10 20
[3,] 3 10 20
...
However, I want this to be more flexible, i.e. loop through every column with "_" in its name and subtract RF
:
#1. try: Does not work; Interestingly, the i on the LHS of := is interpreted as the column i, but on the RHS of
#:= it is interpreted as 2 and 3, respectively
for (i in grep("_", names(DT))){
DT[ , i:= i - 1, with=FALSE]
}
DT
RF S_1 S_2
[1,] 1 1 2
[2,] 2 1 2
[3,] 3 1 2
...
#2. try: Work with parse and eval
for (i in grep("_", names(DT), value=TRUE)){
DT[ , eval(parse(text=i)):= eval(parse(text=i)) - RF]
}
#Error in eval(expr, envir, enclos) : object 'S_1' not found
Any hints how to do that would be great.
EDIT: As soon as I posted the question, I thought to myself: Why are you working with the :=
operator in the first place, and sure enough, I just realized I don't have to. This does work and doesn't need a loop:
DT[, grep("_", names(DT)), with=FALSE] - DT[, RF]
Sorry for that. However, I leave the question open because I'm still interested on why my approach with the :=
operator doesn't work. So maybe someone can help me there.
Solution
You were on the right track with your second attempt. Here is an approach that uses substitute
to build the expression that gets passed in as the 'j'
argument in DT[ , j ]
.
for (i in grep("_", names(DT), value=TRUE)){
e <- substitute(X := X - RF, list(X = as.symbol(i)))
DT[ , eval(e)]
}
DT
# RF S_1 S_2
# [1,] 1 10 20
# [2,] 2 10 20
# [3,] 3 10 20
# [4,] 4 10 20
# [5,] 5 10 20
You could also use an LHS expression rather than a symbol :
for (i in grep("_", names(DT), value=TRUE))
DT[, (i) := get(i)-RF]
OTHER TIPS
A workaround which I unfortunately discovered after I posted the question is as follows:
DT[, .SD, .SDcols = patterns('_')] - DT[, RF]
This also works in a more complicated setting in which there are additional columns you want to keep, but with some extra effort:
library(data.table)
DT <- data.table(RF = 1:10,
S_1 = 11:20,
S_2 = 21:30,
addCol = rnorm(10)) #Column that should not be subtracted by RF, but still kept in DT
DT <- cbind(DT[, .SD, .SDcols = patterns("_")] - DT[, RF], addCol = DT[, addCol])
Updated to use set()
+..
; set
is powerful (see edits for earlier attempt).
varnames <- grep("_", names(DT), value=TRUE)
set(DT, j = varnames, value = DT[, ..varnames] - DT[, RF])