How the command dist(x,method="binary") calculates the distance matrix?

https://stackoverflow.com/questions/23686028

r
distance

23-07-2023
|

Question

I have a been trying to figure that out but without much success. I am working with a table with binary data (0s and 1s). I managed to estimate a distance matrix from my data using the R function dist(x,method="binary"), but I am not quite sure how exactly this function estimates the distance matrix. Is it using the Jaccard coefficient J=(M11)/(M10+M01+M11)?

Solution

This is easily found in the help page ?dist:

This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.

[...]

binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are ‘on’ and zero elements are ‘off’. The distance is the proportion of bits in which only one is on amongst those in which at least one is on.

This is equivalent to the Jaccard distance as described in Wikipedia:

An alternate interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference to the union.

In your notation, it is 1 - J = (M01 + M10)/(M01 + M10 + M11).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow