R: Force data.table to compute all interactions

https://stackoverflow.com/questions/20566260

01-09-2022
|

Question

Here is a data.table:

dat = data.table(var1=rnorm(120), var2=rep(c('a','b','c'),40), var3=rep(c(1,2,3,2,1,2,1,2,2,3,1,2),10))

dat2 = dat[,list(resp = mean(var1)),by=list(var2, var3)]

In dat2, only existing interactions of dat$var2 et dat$var3 are present. How can I force dat2 to contain results for all 9 possible interactions (instead of the 7 rows of dat2) for var2 and var3? If there is no direct solutions with data.table, what is the easiest way to solve this issue?

table(dat$var2, dat$var3)

     1  2  3
  a 20 10 10
  b 20 20  0
  c  0 30 10

Of course, for the interactions where no data exist in dat, dat2 should contain NA in resp.

Solution

You could set the key and then do a crossjoin using CJ in the i like so...

setkey( dat , var2 , var3 )

# Thanks to @Shadow for pointing out to use unique() in the cross join
dat[ CJ( unique(var2) , unique(var3) ) , mean(var1) ]
#   var2 var3          V1
#1:    a    1 -0.25771923
#2:    a    2  0.04143057
#3:    a    3  0.28878451
#4:    b    1  0.18865887
#5:    b    2  0.53632552
#6:    b    3          NA
#7:    c    1          NA
#8:    c    2  0.38015021
#9:    c    3  0.49809159

And by way of explanation, CJ() creates a data.table in the i of x (in this case dat) to join on. It is formed as the cross product of the vectors supplied to CJ(), which happens to be precisely what you are looking for!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow