Insert values of row r into row (r+1) and insert 1 into the first row for multiple columns in a data.table

https://stackoverflow.com/questions/23502789

r
data.table

16-07-2023
|

Question

Given a data.table and a vector indicating multiple target columns: What is the most efficient way to substitute target columns' values in row 1 by 1 and in row r by their values of (r-1) plus 1?

The whole operation should be repeated by a key called id1.

The original data.table and the target columns would look like this

library(data.table)

DT <- data.table(id1=c(1,1,1,2,2,2), id2=c(1,2,3,1,2,3), c1=c(0,1,0,2,1,2), c2=c(0,0,1,1,2,3), c3=c(1,2,2,1,1,1))
setkey(DT,id1,id2)
cnames <- c("c1","c2","c3")

DT
#    id1 id2 c1 c2 c3
# 1:   1   1  0  0  1
# 2:   1   2  1  0  2
# 3:   1   3  0  1  2
# 4:   2   1  2  1  1
# 5:   2   2  1  2  1
# 6:   2   3  2  3  1

This is the desired result

   # id1 id2 c1 c2 c3
# 1:   1   1  1  1  1      #substituted by 1
# 2:   1   2  1  1  2      # previous row + 1
# 3:   1   3  2  1  3      #        "
# 4:   2   1  1  1  1      # substituted by 1
# 5:   2   2  3  2  2      # previous row + 1
# 6:   2   2  2  3  2      #        "

I know that something like DT[,"c1" := c(1,c1[.I-1]+1), by=id1] might work, but this poses two challenges: First, the first value of c1[.I-1] is not defined. And second, the substitution using this code would be performed for one clumn (here: "c1"), whereas I need the substitution to be performed for many columns, indicated in the vector "cnames".

Thanks! Jana

Solution

The easiest way is to first set, for each group, the first row to all 0's. Then, add 1 to each column. This is equivalent of what you're wishing to do. Here's how I'd do it:

setkey(DT, id1)
DT[J(unique(id1)), c(cnames) := list(0L), mult="first"]
DT[, c(cnames) := .SD+1L, .SDcols=cnames]

#    id1 id2 c1 c2 c3
# 1:   1   1  1  1  1
# 2:   1   2  2  1  3
# 3:   1   3  1  2  3
# 4:   2   1  1  1  1
# 5:   2   2  2  3  2
# 6:   2   3  3  4  2

Following OP's comment and edit to the question:

You can accomplish this as follows: First shift the rows by 1 column while replacing first column by 0's and then add 1 to all the columns.

DT[, c(cnames) := lapply(.SD, function(x) 
            c(0L, head(x, -1L))), by=id1, .SDcols=cnames]
DT[, c(cnames) := .SD+1L, .SDcols=cnames]

> DT
#    id1 id2 c1 c2 c3
# 1:   1   1  1  1  1
# 2:   1   2  1  1  2
# 3:   1   3  2  1  3
# 4:   2   1  1  1  1
# 5:   2   2  3  2  2
# 6:   2   3  2  3  2

Another variation by looking at the question in your comment:

First shift the entire data by 1 row, without grouping, and add 1 to it. Then, set first row for each group to all 1's.

setkey(DT, id1)
DT[2:nrow(DT), c(cnames) := head(DT[, ..cnames], -1L) + 1L]
DT[J(unique(id1)), c(cnames) := list(1L), mult="first"]

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow