Question

I have a table:

   x   y   z 
A  2   0   3   
B  0   3   0    
C  0   0   4    
D  1   4   0

I want to calculate the Jaccard similarity in Matlab, between the vectors A, B, C and D. The formula is :

enter image description here

In this formula |x| and |y| indicates the number of items which are not zero. For example |A| number of items that is not zero is 2, for |B| and |C| it is 1, and for |D| it is 2.

|x intersect y| indicates the number of common items which are not zero. |A intersect B| is 0. |A intersect D| is 1, because the value of x in both is not zero.

e.g.: jaccard(A,D)= 1/3=0.33

How can I implement this in Matlab?

Was it helpful?

Solution

Matlab has a built-in function that computes the Jaccard distance: pdist.

Here is some code

X = rand(2,100);
X(X>0.5) = 1;
X(X<=0.5) = 0;

JD = pdist(X,'jaccard')  % jaccard distance
JI = 1 - JD;             % jaccard index

EDIT

A calculation that does not require the statistic toolbox

a = X(1,:);
b = X(2,:);
JD = 1 - sum(a & b)/sum(a | b)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top