There is no need to make a full copy of a Set
in order to insert an element into it. Internally, element are stored in a tree, which means that you only need to create new nodes along the path of the insertion. Untouched nodes can be shared between the pre-insertion and post-insertion version of the Set
. And as Deitrich Epp pointed out, in a balanced tree O(log(n))
is the length of the path of the insertion. (Sorry for omitting that important fact.)
Say your Tree
type looks like this:
data Tree a = Node a (Tree a) (Tree a)
| Leaf
... and say you have a Tree
that looks like this
let t = Node 10 tl (Node 15 Leaf tr')
... where tl
and tr'
are some named subtrees. Now say you want to insert 12
into this tree. Well, that's going to look something like this:
let t' = Node 10 tl (Node 15 (Node 12 Leaf Leaf) tr')
The subtrees tl
and tr'
are shared between t
and t'
, and you only had to construct 3 new Nodes
to do it, even though the size of t
could be much larger than 3.
EDIT: Rebalancing
With respect to rebalancing, think about it like this, and note that I claim no rigor here. Say you have an empty tree. Already balanced! Now say you insert an element. Already balanced! Now say you insert another element. Well, there's an odd number so you can't do much there.
Here's the tricky part. Say you insert another element. This could go two ways: left or right; balanced or unbalanced. In the case that it's unbalanced, you can clearly perform a rotation of the tree to balance it. In the case that it's balanced, already balanced!
What's important to note here is that you're constantly rebalancing. It's not like you have a mess of a tree, decided to insert an element, but before you do that, you rebalance, and then leave a mess after you've completed the insertion.
Now say you keep inserting elements. The tree's gonna get unbalanced, but not by much. And when that does happen, first off you're correcting that immediately, and secondly, the correction occurs along the path of the insertion, which is O(log(n))
in a balanced tree. The rotations in the paper you linked to are touching at most three nodes in the tree to perform a rotation. so you're doing O(3 * log(n))
work when rebalancing. That's still O(log(n))
.