Question

I'm doing some work trying to recommend documents, and to do so I am using the Cosine Similarity method. Here is the code for that method:

static double cosineSimilarity(HashMap<String, Double> v1, HashMap<String, Double> v2) 
{
    Set<String> both = v1.keySet();
    both.retainAll(v2.keySet());
    double sclar = 0, norm1 = 0, norm2 = 0;
    for (String k : both) 
    {
      sclar += v1.get(k) * v2.get(k);
    }
    for (String k : v1.keySet())
    {
      norm1 += v1.get(k) * v1.get(k);
    }
    for (String k : v2.keySet()) 
    {
      norm2 += v2.get(k) * v2.get(k);
    }
    return sclar / Math.sqrt(norm1 * norm2);
}

The problem is that the outcome varies depending on the order that the parameters are passed. For example if I call cosineSimilarity(v1, v2) it will return 0.3 but if I call cosineSimilarity(v2, v1) it will return a completely different value.

I figure this has something to do with the fact that Map.keySet() returns a set backed by the map, but I do not fully understand the implications of this.

Can anyone see where the method is going wrong?

Was it helpful?

Solution

Try

Set<String> both = new HashSet<String>(v1.keySet());

instead of

Set<String> both = v1.keySet();

You shouldn't modify set you got from keySet method because map uses it and when you remove elements from it (by retainAll in your case) elements also removed from map. Example:

Map<Integer, Integer> mp = new HashMap<Integer, Integer>();
mp.put(1, 1);
System.out.println(mp); // output {1=1}
mp.keySet().remove(1);
System.out.println(mp); // output {}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top