Question

I am trying to clear up some things regarding complexity in some of the operations of TreeSet. On the javadoc it says:

"This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains)."

So far so good. My question is what happens on addAll(), removeAll() etc. Here the javadoc for Set says:

"If the specified collection is also a set, the addAll operation effectively modifies this set so that its value is the union of the two sets."

Is it just explaining the logical outcome of the operation or is it giving a hint about the complexity? I mean, if the two sets are represented by e.g. red-black trees it would be better to somehow join the trees than to "add" each element of one to the other.

In any case, is there a way to combine two TreeSets into one with O(logn) complexity?

Thank you in advance. :-)

Was it helpful?

Solution

You could imagine how it would be possible to optimize special cases to O(log n), but the worst case has got to be O(m log n) where m and n are the number of elements in each tree.

Edit:

http://net.pku.edu.cn/~course/cs101/resource/Intro2Algorithm/book6/chap14.htm

Describes a special case algorithm that can join trees in O(log(m + n)) but note the restriction: all members of S1 must be less than all members of S2. This is what I meant that there are special optimizations for special cases.

OTHER TIPS

Looking at the java source for TreeSet, it looks like it if the passed in collection is a SortedSet then it uses a O(n) time algorithm. Otherwise it calls super.addAll, which I'm guessing will result in O(n logn).

EDIT - guess I read the code too fast, TreeSet can only use the O(n) algorithm if it's backing map is empty

According to this blog post:
http://rgrig.blogspot.com/2008/06/java-api-complexity-guarantees.html
it's O(n log n). Because the documentation gives no hints about the complexity, you might want to write your own algorithm if the performance is critical for you.

It is not possible to perform merging of trees or join sets like in Disjoint-set data structures because you don't know if the elements in the 2 trees are disjoint. Since the data structures have knowledge about the content in other trees, it is necessary to check if one element exists in the other tree before adding to it or at-least trying to add it into another tree and abort adding it if you find it on the way. So, it should be O(MlogN)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top