How the command dist(x,method="binary") calculates the distance matrix?

https://stackoverflow.com/questions/23686028

r
distance

23-07-2023
|

题

I have a been trying to figure that out but without much success. I am working with a table with binary data (0s and 1s). I managed to estimate a distance matrix from my data using the R function dist(x,method="binary"), but I am not quite sure how exactly this function estimates the distance matrix. Is it using the Jaccard coefficient J=(M11)/(M10+M01+M11)?

解决方案

This is easily found in the help page ?dist:

This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.

[...]

binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are ‘on’ and zero elements are ‘off’. The distance is the proportion of bits in which only one is on amongst those in which at least one is on.

This is equivalent to the Jaccard distance as described in Wikipedia:

An alternate interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference to the union.

In your notation, it is 1 - J = (M01 + M10)/(M01 + M10 + M11).

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow