Question

I have a been trying to figure that out but without much success. I am working with a table with binary data (0s and 1s). I managed to estimate a distance matrix from my data using the R function dist(x,method="binary"), but I am not quite sure how exactly this function estimates the distance matrix. Is it using the Jaccard coefficient J=(M11)/(M10+M01+M11)?

Was it helpful?

Solution

This is easily found in the help page ?dist:

This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.

[...]

binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are ‘on’ and zero elements are ‘off’. The distance is the proportion of bits in which only one is on amongst those in which at least one is on.

This is equivalent to the Jaccard distance as described in Wikipedia:

An alternate interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference to the union.

In your notation, it is 1 - J = (M01 + M10)/(M01 + M10 + M11).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top