R : Efficient loop on row with data.table

https://stackoverflow.com/questions/23277863

09-07-2023
|

Question

I am using data.table in R and looping over my table, it s really slow because of my table size. I wonder if someone have any idea on

I have a set of value that I want to "cluster". Each line have a position, a positive integer. You can load a simple view of that :

    library(data.table)
    #Here is a toy example    
    fulltable=c(seq (1,4))*c(seq(1,1000,10))
    fulltable=data.table(pos=fulltable[order(fulltable)])
    fulltable$id=1

So I loop in my lines and When there is more than 50 between two position I change the group :

#here is the main loop
lastposition=fulltable[1]$pos
lastid=fulltable[1]$id
for(i in 2:nrow(fulltable)){
    if(fulltable[i]$pos-50>lastposition){
        lastid=lastid+1
        print(lastid)
    }
    fulltable[i]$id=lastid;
    lastposition=fulltable[i]$pos
}

Any idea for an effi

Solution

fulltable[which((c(fulltable$pos[-1], NA) - fulltable$pos) > 50) + 1, new_group := 2:(.N+1)]
fulltable[is.na(new_group), new_group := 1]
fulltable[, c("lastid_new", "new_group") := list(cummax(new_group), NULL)]

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow