consecutive setdiff in datatable list

https://stackoverflow.com//questions/12666372

11-12-2019
|

Question

Using data organised as

dtl <- replicate(10,data.table(id=sample(letters,10),val=sample(10)), simplify=F)
lapply(dtl, function(x){setkey(x,'id')})

I need to extract a list of datatables that contain the rows in dtl[[n+1]]] with id not present in dtl[[n]]. I assume it would be something like

dtl2 <- list(setdiff(dtl[[1]][['id']],dtl[[2]][['id']]),setdiff(dtl[[2]][['id']],dtl[[3]][['id']]...)

Please notice that, while the setdiff should only take the id column into account, I expect the result to contain all columns from each datatable.

Solution

I think this will do it for you:

mapply(setdiff, head(dtl, -1), tail(dtl, -1), SIMPLIFY = FALSE)

Edit: with your new expected output, I would still use mapply as above, but with one of the following two changes:

replace setdiff with function(x,y)setdiff(x$id, y$id)
replace dtl with ids <- lapply(dtl, "[", "id")

Edit2:: you've changed your expected output again by adding a plain English description that does not match the code you had provided... I think you are now looking for this:

mapply(function(x,y)y[setdiff(y$id, x$id), ],
       head(dtl, -1), tail(dtl, -1), SIMPLIFY = FALSE)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow