calculate complexity of LinkedHashSet

https://softwareengineering.stackexchange.com/questions/245362

04-10-2020
|

Question

I have an ArrayList<LinkedHashSet<String>> setOfStrings for example this arraylist internally is composed like:

positionX[hello,car,three,hotel,beach]
positionY[.....]
...

I want to find car inside this data-structure, so I did it

for (Iterator<LinkedHashSet<String>> iterator = setOfStrings.iterator(); iterator.hasNext();) { 
    LinkedHashSet<String> subSet = iterator.next();
    if (subSet.contains("hotel"))
        System.out.println("found");
}

The for iterate over the entire arrayList and the complexity in the worst-case is O(n), but I'm confused on the complexity of the method contains() of set. In according of javadocs this method is executed in constant time, but I've heard that in certain cases the complexity might become O(n). Saying that, I'm not sure about the complexity of this snippet algorithm.

Can someone provide me an explanation of that?

Solution

LinkedHashSet, for the intents and purposes of being accessed using contains is simply a hash set. It uses the return from hashCode of the objects inserted in it to determine the position to place it in in the hash set. If you have a collision, then it will then check the next element. If that is occupied, it will then check the one after that, and so forth. Therefore, for hash sets with relatively small capacity or types which do not return distinguishable hashCode values, you will see up to O(n) complexity for insertion or checking the esistence of an item in the hash set. However most times you don't see collisions and so in most cases it will be O(1).

Combine this with a O(n) operation on all entires in your ArrayList, and you end up with O(n)*O(1) complexity on average or O(n). However if the hashCode() does not properly distinguish values or if the capacity is small for the LinkedHashSet, you may see up to O(n*m) complexity (O(n)*O(m)) where n is the number of elements in your ArrayList and m being the number of elements on average in each LinkedHashSet.

Hope that answers your question!

OTHER TIPS

For hashing operations like the contains() you have above, the worst case complexity is big O of n. This happens when there are n instances with the same hash value and the hashing implementation is chaining. This also happens when n instances have the same hash-value sequence and the implementation is open-addressing. Both of these cases are unlikely, but nevertheless contains() is big O of (n) and small O of (n) and thus is theta of (n).

In practice, people often care about average running time complexity, and average running time complexity of contains() running on a large enough sequence of input is indeed amortized into about constant.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange