I have been given some code to optimise. One of the bits contains some code which takes a set with elements and for all elements in the set compares them to all other elements. The comparison isn't symmetric so no shortcut there. The code looks as follows:
for(String string : initialSet)
{
Set<String> copiedSet = new HashSet<>(initialSet);
copiedSet.remove(string);
for(String innerString : copiedSet)
{
/**
* Magic, unicorns, and elves! Compare the distance of the two strings by
* some very fancy method! No need to detail it here, just believe me it
* works, it isn't the subject of the question!
*/
}
}
To my understanding, the complexity would look as follows: the initial loop has a complexity of O(n)
where n is the size of the initial set. Creating a set via the copy constructor would, in my understanding induce equals
tests on all elements as the set would need to ensure the contract of the set, that is, no duplicate elements. This would mean that for n insertions, the complexity would increase by the sum from 0 to n-1. The removal would again need to check, in the worst case, n elements. The inner for loop then loops on n-1 elements.
The method I have used this far is simply:
for(String string : set)
{
for(String innerString : copiedSet)
{
if(! string.equals(innerString)
{
/**
* Magic, unicorns, and elves! Compare the distance of the two strings by
* some very fancy method! No need to detail it here, just believe me it
* works, it isn't the subject of the question!
*/
}
}
}
In my understanding, this would induce a complexity of roughly O(n^2)
abstracting the complexity of the code in the if clause
.
Therefore, the second piece of code would be better by at least one order plus the sum I outlined above. However, I am working with a dangerous assumption, and that is that I assume how the copy constructor of the HashSet
works. Simple benchmarks showed that the results were indeed better for the second snipped by about a factor of n. I would like to tap into your knowledge to confirm these findings and gain more insight into the workings of the copy constructor if possible. Also, the ideal case would be to find a resource listing functions by time complexity but I guess that last one will remain wishful thinking!