How can I construct an outer product matrix from a data.table

Question 1

Here's one way (not sure how it scales though):

rownames(M) <- colnames(M) <- LETTERS[1:3]
ans <- DT[, list(idx1=.I, idx2=rep(.I, each=.N), 
            val=as.vector(M[Key1, Key1])), by=Key2]
dcast.data.table(ans, idx2 ~ idx1, value.var="val", fill=0L)

#    idx2 1   2   3   4   5   6 7
# 1:    1 1 0.0 0.0 0.0 0.0 0.0 0
# 2:    2 0 1.0 0.4 0.0 0.0 0.0 0
# 3:    3 0 0.4 1.0 0.0 0.0 0.0 0
# 4:    4 0 0.0 0.0 1.0 0.4 0.3 0
# 5:    5 0 0.0 0.0 0.4 1.0 0.2 0
# 6:    6 0 0.0 0.0 0.3 0.2 1.0 0
# 7:    7 0 0.0 0.0 0.0 0.0 0.0 1

dcast.data.table is available from data.table versions >= 1.9.0. The current stable CRAN version at the time of writing is 1.9.2.

Question 2

This does what you want:

1.Set up the data.

DT <- data.frame(Key1 = c("A", "A", "B", "A", "B", "C", "C"), Key2 = c(1, 2, 2, 3, 3, 3, 4))

M <- matrix(c(1, 0.4, 0.3, 0.4, 1, 0.2, 0.3, 0.2, 1), nrow = 3)

2.Subset the matrix, grouping by Key2 (this returns a list).

BD <- by(DT, DT$Key2, function(df) {N = as.numeric(df$Key1); M[N,N]})

3.Construct a block-diagonal matrix.

library(magic)

do.call(adiag, BD)

Question 3

My data.table skills aren't that strong, but I came up with a solution that takes advantage of the indices, ~~but only if I added the row numbers~~.

# DT$row<-1:nrow(DT) # No longer necessary.
# Add dimension names to matrix for convenience
rownames(M)<-colnames(M)<-c('A','B','C') 

f<-function(k1,k2) {
  # rows<-DT[.(k2)]$row 
  rows<-DT[.(k2),.I]$.I
  ret<-rep(0,nrow(DT))
  ret[rows]<-M[DT[.(k2)]$Key1,k1]
  ret
}

mapply(f,DT$Key1,DT$Key2)
#      A   A   B   A   B   C C
# [1,] 1 0.0 0.0 0.0 0.0 0.0 0
# [2,] 0 1.0 0.4 0.0 0.0 0.0 0
# [3,] 0 0.4 1.0 0.0 0.0 0.0 0
# [4,] 0 0.0 0.0 1.0 0.4 0.3 0
# [5,] 0 0.0 0.0 0.4 1.0 0.2 0
# [6,] 0 0.0 0.0 0.3 0.2 1.0 0
# [7,] 0 0.0 0.0 0.0 0.0 0.0 1

This should be a little better in the sense that the indices will be called. More efficient solutions might take advantage of the known diagonal nature of the output matrix. ~~I wonder if there is a way to do this without adding the row numbers?~~ The comment below indicated one way of getting the row number, I have implemented it above.

Question 4

This is edited to use native data.table() features - hopefully it should perform better!

# make the cor matrix into an expand.grid equivalent - all combos - using CJ for cross join
cor_list<-data.table(CJ(LETTERS[1:nrow(M)],LETTERS[1:nrow(M)]))
# fill with the values for M
cor_list[,cor:=unlist(as.list(M))]
# index on combination of correlation inputs
setkey(cor_list, V1, V2)

# lookup correlation for all combos of DT v DT
DTX<-DT[,cor_list[J(Key1,DT[,Key1],DT[,Key2])],by=c("Key1","Key2")]
# index on Key2
setkey(DTX,Key2)
# Set cor=0 where Key2 doesn't match (OK, it's a bit of a hack!)
DTX[Key2!=V3,cor:=0]

# fill a matrix with the vector of correlations (it fits)
# original length of DT gives you the length of side 
matrix(DTX[,cor],nrow(DT))

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1  0.0  0.0  0.0  0.0  0.0    0
[2,]    0  1.0  0.4  0.0  0.0  0.0    0
[3,]    0  0.4  1.0  0.0  0.0  0.0    0
[4,]    0  0.0  0.0  1.0  0.4  0.3    0
[5,]    0  0.0  0.0  0.4  1.0  0.2    0
[6,]    0  0.0  0.0  0.3  0.2  1.0    0
[7,]    0  0.0  0.0  0.0  0.0  0.0    1

EDITED ABOVE - DOUBLE APPLY SLOW AS PER ROLAND'S COMMENT

How about this?

#function to return letter corresponding to number
lookup_letter<-function(let){match(let,matrix(c("A","B","C")))}

then nest 2 apply calls for each dimension of the matrix

apply(DT,1,function(x){                # call row-wise
  apply(DT,1,function(y)ifelse(y[2]==x[2],M[lookup_letter(x[1]),lookup_letter(y[1])],0))   # call column-wise lookup
  })

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1  0.0  0.0  0.0  0.0  0.0    0
[2,]    0  1.0  0.4  0.0  0.0  0.0    0
[3,]    0  0.4  1.0  0.0  0.0  0.0    0
[4,]    0  0.0  0.0  1.0  0.4  0.3    0
[5,]    0  0.0  0.0  0.4  1.0  0.2    0
[6,]    0  0.0  0.0  0.3  0.2  1.0    0
[7,]    0  0.0  0.0  0.0  0.0  0.0    1

Probably there are better ways to lookup your correlation number, but this gives you an idea (maybe flatten M into an indexed list)