Question

I'm just learning how to use data.table and I'm running into an unexpected issue. I have a large dataset all with several indices, 21 columns and ~20K rows. I aggregate all grouped by 2 of the indexed columns and name it fail. When I attempt to filter the rows of all with fail, I am only successful if the indices in fail are the first 2 indices in all. This is not naturally the case. How can I tell data.table to ignore the indices in all that fail does not share?

test dataset:

set.seed(42)
all<-data.table(rep= rep(1:2, each=15, 10), 
                loc= rep(letters[1:15], 20), 
                foo= sample(LETTERS), 
                 x=rnorm(n=300)) 
setkey(all, rep, loc, foo) #note that foo is last instead of 2nd, which would be its desired default position

fail<- all[,sum(x) < -5, by=list(rep,loc)]
setkey(fail, rep,loc)

fail[V1==T]
       rep loc   V1
    1:   2   d TRUE

working filter

all[fail[V1==T]]
        rep loc foo           x   V1
     1:   1   f   A -0.46972958 TRUE
     2:   1   f   B  0.18819303 TRUE
     3:   1   f   C -0.65850343 TRUE
     4:   1   f   D -0.88577630 TRUE
     5:   1   f   I  0.08489806 TRUE
     6:   1   f   K -2.44046693 TRUE
     7:   1   f   R -0.43144620 TRUE
     8:   1   f   T  1.81522845 TRUE
     9:   1   f   U -1.01759612 TRUE
    10:   1   f   W -2.11320011 TRUE

non-working filter due to changed index order

setkey(all, rep, foo, loc) #foo moved from last to 2nd key
all[fail[V1==T]]
       rep foo loc  x   V1
    1:   2   d  NA NA TRUE
Was it helpful?

Solution

I assume you don't want to re-key the all table (though that would do the trick as well). Does using data.table::merge do what you want?

merge(all, fail[V1==T], by = c("rep", "loc"))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top