Disjoint-set forests - why should the rank be increased by one when the find of two nodes are of same rank?

Question 1

Because in this case - you add one tree is a "sub tree" of the other - which makes the original subtree increase its size.

Have a look at the following example:

1           3
|           |
2           4

In the above, the "rank" of each tree is 2.
Now, let's say 1 is going to be the new unified root, you will get the following tree:

after the join the rank of "1" is 3, rank_old(1) + 1 - as expected.¹

As for your second question, because it will yield false height for the trees.

If we take the above example, and merge the trees to get the tree of rank 3. What would happen if we then want to merge it with this tree²:

We'll find out both ranks are 4, and try to merge them the same way we did before, without favoring the 'shorter' tree - which will result in trees with higher height, and ultimately - worse time complexity.

(1) Disclaimer: The first part of this answer is taken from my answer to a similar question (though not identical due to your last part of the question)

(2) Note that the above tree is syntatically made, it cannot be created in an optimized disjoint forests algorithms, but it still demonstrates the issues needed for the answer.

Question 2

First, what is rank? It is almost the same as the height of a tree. In fact, for now, pretend that it is the same as the height.

We want to keep trees short, so keeping track of the height of every tree helps us do that. When unioning two trees of different height, we make the root of the shorter tree a child of the root of the taller tree. Importantly, this does not change the height of the taller tree. That is, the rank of the taller tree does not change.

However, when unioning two trees of the same height, we make one root the child of the other, and this increases the height of that overall tree by one, so we increase the rank of that root by one.

Now, I said that rank was almost the same as the height of the tree. Why almost? Because of path compression, a second technique used by the union-find data structure to keep trees short. Path compression can alter an existing tree to make it shorter than indicated by its rank. In principle, it might be better to make decisions based on the actual height than using rank as a proxy for height, but in practice, it is too hard/too slow to keep track of the true height information, whereas it is very easy/fast to keep track of rank.

You also asked "What happens if I simply add the two ranks (i.e. 2*r)?" This is an interesting question. The answer is probably nothing, meaning everything will still work just fine, with the same efficiency as before. (Well, assuming that you use 1 as your starting rank rather than 0.) Why? Because the way rank is used, what matters is the relative ordering of ranks, not their absolute magnitudes. If you add them, then your ranks will be 1,2,4,8 instead of 1,2,3,4 (or more likely 0,1,2,3), but they will still have exactly the same relative ordering so all is well. Your rank is simply 2^(the old rank). The biggest danger is that you run a larger risk of overflowing the integer used to represent the rank when dealing with very large sets (or, put another way, that you will need to use more space to store your ranks).

On the other hand, notice that by adding the two ranks, you are approximating the size of the trees rather than the heights of the trees. By always adding the two ranks, whether they are equal or not, then you are exactly tracking the sizes of the trees. Again, everything works just fine, with the same caveats about the possibility of overflowing integers if your trees are very large.

In fact, union-by-size is widely recognized as a legitimate alternative to union-by-rank. For some applications, you actually want to know the sizes of the sets, and for those applications union-by-size is actually preferabe to union-by-rank.

Question 3

If you read that paragraph in a little more depth, you'll realize that rank is more like depth, not size:

Since it is the depth of the tree that affects the running time, the tree with smaller depth gets added under the root of the deeper tree, which only increases the depth if the depths were equal. In the context of this algorithm, the term "rank" is used instead of "depth" ...

and a merge of equal depth trees only increases the depth of the tree by one since the root of the one is added to the root of the other.

Consider:

  A                  D
 / \   merged with  / \
B   C              E   F

is:

  A
 /|\
B C D
   / \
  E   F

The depth was 2 for both, and it's 3 for the merged one.

Question 4

Rank represents the depth of the tree, not the number of nodes in it. When you join a tree with a smaller rank with a tree with a larger rank, the overall rank remains the same.

Consider adding a tree with rank 4 to the root of the tree of rank 6: since we added a node above the root of the depth-4 tree, that subtree now has a rank of 5. The subtree to which we've added our depth-4 tree, however, is 6, so the rank does not change.

Now consider adding a tree with rank 6 to the root of a second tree of rank 6: since the root of the first depth-6 tree now has an extra node above it, the rank of that subtree (and the tree overall) changes to 7.

Since the rank of the tree determines the processing speed, the algorithm tries to keep the rank as low as possible by always attaching a shorter tree to the taller one, keeping the overall rank unchanged. The rank changes only when the trees have identical ranks, in which case one of them gets attached to the root of the other, bumping up the rank by one.

Question 5

Actually Here two important properties should be known very well to us ....

1) What is Rank ? 2) Why Rank is Used ???

Rank is nothing but the depth of a tree .U can say rank as depth (level) of a tree . When we make union nodes then these (graph nodes ) will be formed as a tree with an ultimate root node.Rank is expressed only for those root nodes .

A merged with D

Initially A has rank (level) 0 and D has rank(level) 0 . So u can merge them making anyone of them as a root . Because if u make A as root the rank(level) will be 1 and if u make D as a root then the rank will also be 1

A
 `D

Here rank ( level ) is 1 when root is A .

Now think for another ,

A    merge   B     ----->   A 
 `D           `C           / \
                          D   B
                               \
                                C

So the level will be increased by 1 , see exactly without root (A) there is at most height / depth / rank is 2 . rank[ 1] -> {D,B} and rank [2] -> {C} ................

Now our main objective is to make tree with minimum rank(depth) as possible while merging ..

Now when two differnt rank tree merge ,then

 A(rank 0) merge B(rank 1)---> B  Here merged tree rank is 1 same as high rank (1) 
                  `C          / \
                             A   C

When small rank goes under over high rank . Then the merged tree's rank(height/depth) will be the same rank associated with higher rank tree .That means the rank will not increase , the merged tree rank will be same as higher rank before ...

But if we will do the reverse work means high rank tree goes under over low rank tree then see ,

A ( rank 0 ) merge B  (rank 1 ) --> A   ( merged tree rank 2 greater than both )
                    `C               `B
                                       `C

So , whatever is seen from following observation is that if we try to keep rank (height) of merged tree as minimum possible then , we have to choose the first process. i think this part is clear !!

Now u have to understand what is our objective to keep tree's height minimum as possible ..........

when we use disjoint set union then for path compression ( finding ultimate root with whom a node is connected ) when we traverse from a node to it's root node then if it's height (rank) is long then time processing will be slow .That's why when we try to merge two trees then we try to keep heigh/depth/rank as minimum as possible