Disjoint sets data structures and binomial trees?

Question 1

Disjoint set data structures are data structures for representing a partition of a set S. You begin with a set S of elements, each of which belongs to its own group. For example:

{1} {2} {3} {4} {5} {6}

One operation on a disjoint-set data structure is the union operation, which combines together the two sets containing the given elements. For example, unioning together 1 and 2 gives back the partition

{1, 2} {3} {4} {5} {6}

Unioning together 3 and 5 produces

{1, 2}, {3, 5}, {4}, {6}

Now, unioning together 1 and 3 produces the partition

{1, 2, 3, 5}, {4}, {6}

The find operation tells you which set a given element belongs to. Typically, this is done by having find return a representative element of the element it belongs to. This is usually done such that

find(x) == find(y)  if and only if  x and y are in the same set.

For example, find(1) might return 2, and so find(2) = 2, find(3) = 2, find(5) = 2.

Disjoint set data structures are often used as a subroutine in Kruskal's minimum spanning tree algorithm, as they provide a very fast way of checking whether two nodes in the graph are connected and an easy way of marking that all nodes in two connected components are connected to one another when an edge is added. Using the disjoint-set forest implementation with union-by-rank and path compression, n operations on a disjoint-set forest can be done in O(n α(n)) time, where α(n) is the inverse Ackermann function, a function that grows so slowly it's effectively a constant (it's at most four for any input less than the size of the universe.)

As for binomial trees and binary trees: I think what you are asking about is how to represent binomial trees, which are many-way trees, using binary trees, which have at most two children. Not all binomial trees are binary trees, so a suitable encoding must be used.

One way to do this is using something called the left-child right-sibling representation. This represents a many-way tree as a binary tree according to the following setup:

The left child of each node points to the node's first child.
The right child of each node points to its next sibling (node in the same layer with the same parent).

For example, given this binomial tree:

     a
   / | \
  b  c  d
 /|  |
e f  g
  |
  h

The left-child right-sibling representation would be

By the way - if you do this on binomial trees, you end up with a representation of a binomial tree as something called a half-ordered half-tree, which is a binary tree with the following properties:

Every node in the tree is greater than or equal to (or less than or equal to, depending on whether this is a min-heap or a max-heap) every node in its left subtree.
The root node has no right child.

These definitions follow from the fact that a binomial tree is heap-ordered and then converted into a left-child right-sibling representation. Using this representation, it is extremely fast to link together to binomial trees. I'll leave that as an exercise to the reader. :-)

Hope this helps!

Question 2

The disjoint sets I learned at uni revolved around three essential functions.

make_set(x) - makes a new disjoint contains only the element x
find_set(x) - gives you the set that contains element x
union(x,y) - unions the sets that contain x and y

The implementation they mentioned was with linked lists. That is each set has the a representative of the element that created the set. (make_set(x)) and then with unions(x,y), the end pointer of x is moved to point to y. Union and make_set are fast but this was pretty slow for find_set (in fact O(biggest set))

The better implementation used two methods called path compression, which as an element was passed by with union and/or find_set, it made it point to the representative of the set

The other, union-by-rank, which maintained a rank for each set that gave the greatest 'depth' of the set. When unioned, if the rank of each set was the same, it added one to the rank and one representative was changed to point to the other. If they were different, then the smaller set was changed to point to the representative of the larger and the rank is left unchanged. This asymptotic upper bound of this is really close to just the number of uses of the functions.

Hope that helps.

Question 3

Disjoint set is basically a union-find Data Structure.

You originally have a set of n nodes, and you have find(node) and union(node1,node2) operations on it.

union(node1,node2) is making "combining" the nodes to be in one set
find(node) is finding the canonic representation of node (by giving the root, for example, as later explained)

For example, you originally have {1},{2},{3},{4},{5}, and you do:

union(1,2)
union(3,4)

Then you end up having {1,2},{3,4},{5}.
This also means that at this point find(1) == find(2) (It is the same set!)

If you later union(2,3) - it will result in unioning the set containing 2 with the set containing 3, and you will end up with {1,2,3,4},{5}

Regarding a video request: This lecture from Berkley seems to cover the material pretty well.

Regarding binary trees - it is one way of implementation, each "root" has its sons, but the tree is actually "upside down", instead of having pointers from father to sons, you have pointers from sons to father.
This way, the canon representation of each node is the root the node leads up to, and this ensures us that if we did union on a and b - then find(a) = find(b), because they have the same root.

I hope it gives you some leads on what this DS is.
Good luck!