consecutive setdiff in datatable list
-
11-12-2019 - |
Question
Using data organised as
dtl <- replicate(10,data.table(id=sample(letters,10),val=sample(10)), simplify=F)
lapply(dtl, function(x){setkey(x,'id')})
I need to extract a list of datatables that contain the rows in dtl[[n+1]]] with id not present in dtl[[n]]. I assume it would be something like
dtl2 <- list(setdiff(dtl[[1]][['id']],dtl[[2]][['id']]),setdiff(dtl[[2]][['id']],dtl[[3]][['id']]...)
Please notice that, while the setdiff should only take the id column into account, I expect the result to contain all columns from each datatable.
Solution
I think this will do it for you:
mapply(setdiff, head(dtl, -1), tail(dtl, -1), SIMPLIFY = FALSE)
Edit: with your new expected output, I would still use mapply
as above, but with one of the following two changes:
- replace
setdiff
withfunction(x,y)setdiff(x$id, y$id)
- replace
dtl
withids <- lapply(dtl, "[", "id")
Edit2:: you've changed your expected output again by adding a plain English description that does not match the code you had provided... I think you are now looking for this:
mapply(function(x,y)y[setdiff(y$id, x$id), ],
head(dtl, -1), tail(dtl, -1), SIMPLIFY = FALSE)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow