Question

I have been given some code to optimise. One of the bits contains some code which takes a set with elements and for all elements in the set compares them to all other elements. The comparison isn't symmetric so no shortcut there. The code looks as follows:

for(String string : initialSet)
{
    Set<String> copiedSet = new HashSet<>(initialSet);
    copiedSet.remove(string);

    for(String innerString : copiedSet)
    {
        /** 
          * Magic, unicorns, and elves! Compare the distance of the two strings by
          *  some very fancy method! No need to detail it here, just believe me it
          *  works, it isn't the subject of the question!
          */
    }
}

To my understanding, the complexity would look as follows: the initial loop has a complexity of O(n) where n is the size of the initial set. Creating a set via the copy constructor would, in my understanding induce equals tests on all elements as the set would need to ensure the contract of the set, that is, no duplicate elements. This would mean that for n insertions, the complexity would increase by the sum from 0 to n-1. The removal would again need to check, in the worst case, n elements. The inner for loop then loops on n-1 elements.

The method I have used this far is simply:

for(String string : set)
{
    for(String innerString : copiedSet)
    {
        if(! string.equals(innerString)
        {
            /** 
              * Magic, unicorns, and elves! Compare the distance of the two strings by
              *  some very fancy method! No need to detail it here, just believe me it
              *  works, it isn't the subject of the question!
              */
        }
    }
}

In my understanding, this would induce a complexity of roughly O(n^2) abstracting the complexity of the code in the if clause.

Therefore, the second piece of code would be better by at least one order plus the sum I outlined above. However, I am working with a dangerous assumption, and that is that I assume how the copy constructor of the HashSet works. Simple benchmarks showed that the results were indeed better for the second snipped by about a factor of n. I would like to tap into your knowledge to confirm these findings and gain more insight into the workings of the copy constructor if possible. Also, the ideal case would be to find a resource listing functions by time complexity but I guess that last one will remain wishful thinking!

Was it helpful?

Solution

The source code for the copy constructor is widely available, so you can study that as well as clone() and see if one of them suits you.

But truly, if all you are trying to do is avoid comparing an element with itself then I think your second idea involving magic, unicorns, and Elvis elves, is probably the best idea of all. Comparing every element in a Set with every other element in it is inherently an O(n2) problem, and you're not going to get much better than that.

OTHER TIPS

There's no reason to compare the elements in a Set. By definition, they are all different to one another.

From the javadoc:

A collection that contains no duplicate elements.

More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.

As implied by its name, this interface models the mathematical set abstraction.

If you have different type of collection, though, and want to skip the comparing with self, you can't iterate with a step variable(s) (i and j) and skip the steps in which they are equal. For example:

for (int i = 0; i < collection.size(); i++) {
    for (int j = 0; j < collection.size(); j++) {
        if (i != j) {
             //compare
        }
    }
}

I'm not sure exactly what you are doing in your "comparison" but if it really is just finding matching elements then the Set Interface at http://docs.oracle.com/javase/tutorial/collections/interfaces/set.html has some useful methods.

For example:

  • s1.retainAll(s2) — transforms s1 into the intersection of s1 and s2. (The intersection of two sets is the set containing only the elements common to both sets.)

  • s1.removeAll(s2) — transforms s1 into the (asymmetric) set difference of s1 and s2. (For example, the set difference of s1 minus s2 is the set containing all of the elements found in s1 but not in s2.)

  • s1.addAll(s2) — transforms s1 into the union of s1 and s2. (The union of two sets is the set containing all of the elements contained in either set.)

This lets you easily get intersections, combinations, etc for Java Sets.

In general the Java collections classes use very efficient algorithms so you are unlikely to improve upon them without a lot of work.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top