Question

In Java, how can we find similarity measure between two vectors having different length. Consider we have

V1 = [1, 0, 0, 1, 1]
V1 = [1, 0, 1, 0, 1, 0, 1, 0]

I looking for similarity measure other than Jaccard Coefficient or Sørensen–Dice coefficient

Was it helpful?

Solution

As someone already commented, a possible alternative would be the Levenshtein distance, also sometimes referred to as the edit distance.

The Levenshtein distance is a function which assigns to every pair of strings A and B a natural number n, which represents the minimum number of operations need to transform A to B. The allowed operations are

  1. Delete a symbol from A,
  2. Insert a symbol into A,
  3. Replace a symbol in A.

Note that the edit distance is symmetric (as for any sequence of operations that transforms A to B) it is possible to construct an "inverted" sequence of operations which transforms B to A.

The Wikipedia article on the Levenshtein distance lists some useful properties.

Finally, as an example, let's transform your two vectors:

[10011]
// Insert 1 into position 2:
[101011]
// Insert 0 into position 5:
[1010101]
// Insert 0 into position 7:
[10101010]

We found a sequence of 3 operations. If we manage to prove that there are no shorter sequences, then we could conclude that the distance between V1 and V2 is 3. Well, considering that the Levenshtein distance is always at least the difference in size between the two strings (think about why that is), then we have our conclusion:

levenshtein_distance(V1,V2) == 3 // returns true!

Hope this helps!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top