Yeah, I don't know anything about scalding but this seems odd. If you look at zip
implementation it mentions specifically that it does an outer join to preserve zeros on either side. So it does not seem that the comment applies to how zeroes are actually treated in matrix.zip
.
Besides looking at the dimension returned by zip, it really seems this line just replicates the aSumVct
column vector for each column:
val xMat = intersectMat.zip(aSumVct).mapValues( pair => pair._2 )
Also I find the val bSumVct = aBinary.sumRowVectors
suspicious, because it sums the matrix along the wrong dimension. It feels like something like this would be better:
val bSumVct = aBinary.tranpose.sumRowVectors
Which would conceptually be the same as aSumVct.transpose
, so that at the end of the day, in the cell (i, j) of xMat + yMat
we find the sum of elements of row(i)
plus the sum of elements of row(j)
, then we subtract intersectMat
to adjust for the double counting.
Edit: a little bit of googling unearthed this blog post: http://www.flavianv.me/post-15.htm. It seems the comments were related to that version where the vectors to compare are in two separate matrices that don't necessarily have the same size.