Question

I've got a perfect binary tree that's enumerated the post-order way. An example of such tree would be

                         15
                 7               14
             3       6       10      13
           1   2   4   5    8  9   11  12

The size of the tree is known to me. I'm looking for a formula or a simple algorithm that would take a single number as an input (the ID of the vertex I'm interested in) and return also a single number - the ID of the parent. It's quite easy to traverse the tree from the top and get the result in O(log n). Is there a faster solution? I'm mostly interested in the leaves, so if there's a solution for the special case, bring it on too.

Was it helpful?

Solution

Parent index could be found in O(log* n) time and O(1) space.

Here log* n means iterated logarithm: the number of times the logarithm function must be iteratively applied before the result is less than or equal to 1.

Actually that may be done even faster - in O(1) time if we could afford O(n) space for a large lookup table (storing parent index for each node in the tree).

Below I'll sketch several algorithms that do not need any extra space and give result in O(log n) worst case time, O(log log n) expected time, O(log log n) worst case time, and O(log* n) worst case time. They are based on following properties of post-order indexes for perfect binary tree:

  1. All indexes on the leftmost path of the tree are equal to 2i-1.
  2. Indexes of every right child of the node on the leftmost path are equal to 2i-2.
  3. Any node on the leftmost path and its right sub-tree are labeled with indexes having most-significant non-zero bit on the same position: i.
  4. Left sub-tree of any node on the leftmost path contains 2i-1 nodes. (Which means after subtracting 2i-1 we'll get a node that is positioned in similar way relative to its parent, has the same depth, but is closer to "special" nodes satisfying properties #1 and #2).

Properties #1 and #2 give simple algorithms to get parent node for some nodes of the tree: for indexes of the form 2i-1, multiply by 2 and add 1; for indexes of the form 2i-2, just add 1. For other nodes we could repeatedly use property #4 to come to the node satisfying property #1 or #2 (by subtracting sizes of several left sub-trees), then find some parent node located on the leftmost path, then add back all previously subtracted values. And property #3 allows to quickly find size of which sub-trees should be subtracted. So we have following algorithm:

  1. While ((x+1) & x) != 0 and ((x+2) & (x+1)) != 0 repeat step 2.
  2. Clear most-significant non-zero bit and add 1. Accumulate the difference.
  3. If ((x+1) & x) == 0, multiply by 2 and add 1; otherwise if ((x+2) & (x+1)) == 0, add 1.
  4. Add back all differences accumulated on step 2.

For example, 12 (in binary form 0b1100) is transformed on step #2 to 0b0101, then to 0b0010 (or 2 in decimal). Accumulated difference is 10. Step #3 adds 1 and step #4 adds back 10, so the result is 13.

Other example: 10 (in binary form 0b1010) is transformed on step #2 to 0b0011 (or 3 in decimal). Step #3 doubles it (6), then adds 1 (7). Step #4 adds back accumulated difference (7), so the result is 14.

Time complexity is O(log n) - but only when all elementary operations are executed in O(1) time.

To improve time complexity we could perform several iterations of step #2 in parallel. We could get n/2 high-order bits of the index and compute population count on them. If after adding the result to the remaining low-order bits the sum does not overflow to these high-order bits, we could apply this approach recursively, with O(log log n) complexity. If we have an overflow, we could roll back to the original algorithm (bit-by-bit). Note that all set low-order bits should be also treated as overflow. So the resulting complexity is O(log log n) expected time.

Instead of rolling back to bit-by-bit we could handle overflow using binary search. We could determine how many high-order bits (less than n/2) are to be selected so that we either have no overflow or (as for index 0b00101111111111) the number of selected non-zero high-order bits is exactly 1. Note that we do not need to apply this binary search procedure more than once because second overflow would not happen while number of bits in the number is greater than O(log log n). So the resulting complexity is O(log log n) worst case time. All elementary operations are assumed to be executed in O(1) time. If some operations (population count, leading zero count) are implemented in O(log log n) time, then our time complexity increases to O(log2 log n).

Instead of dividing bits of the index into two equal sets we could use different strategy:

  1. Compute population count of the index and add it to index value. Most significant bit that changed from 0 to 1 determines separation point for high-order/low-order bits.
  2. Compute population count on high-order bits, then add the result to low-order bits.
  3. If "separation" bit is non-zero and ((x+1) & x) != 0 and ((x+2) & (x+1)) != 0, continue from step #1.
  4. If all low-order bits are set and least significant high-order bit is set, process this corner case as right child.
  5. If ((x+1) & x) != 0 and ((x+2) & (x+1)) != 0, also process this as right child.
  6. If ((x+1) & x) == 0, multiply by 2 and add 1; otherwise if ((x+2) & (x+1)) == 0, add 1.

If condition of step #3 is satisfied, this means addition on step #2 resulted in carry to "separation" bit. Other low-order bits represent some number that cannot be greater than original population count. Number of set bits in this number cannot be greater than logarithm of population count of the original value. This means that number of set bits after each iteration is at most logarithm of the number of set bits on previous iteration. Therefore worst case time complexity is O(log* n). This is very close to O(1). For example, for 32-bit numbers we need approximately 2 iterations or less.

Every step of this algorithm should be obvious, except probably step #5, correctness of which is to be proven. Note that this step is executed only when adding population count results in carry to "separation" bit but adding population count of only high-order bits does not result in this carry. "Separation" bit corresponds to value 2i. Difference between population count of all bits and population count of only high-order bits is at most i. So step #5 deals with a value at least 2i-i. Let's apply bit-by-bit algorithm to this value. 2i-i is large enough so that bit i-1 is set. Clear this bit and add 1 to the value in bits 0..i-2. This value is at least 2i-1-(i-1) because we just subtracted 2i-1 and added 1. Or if we move index one position to the right, we have again at least 2i-i. If we repeat this procedure we'll always find non-zero bit at position i-1. Each step gradually decreases difference between value in bits 0..i-1 and nearest power of 2. When this difference comes to 2, we could stop this bit-by-bit algorithm because current node is clearly a right child. Since this bit-by-bit algorithm always comes to the same result, we could skip it and always process current node as a right child.

Here is C++ implementation of this algorithm. More details and some other algorithms could be found on ideone.

uint32_t getMSBmask(const uint32_t x)
{ return 1 << getMSBindex(x); }

uint32_t notSimpleCase(const uint32_t x)
{ return ((x+1) & x) && ((x+2) & (x+1)); }

uint32_t parent(const uint32_t node)
{
    uint32_t x = node;
    uint32_t bit = x;

    while ((x & bit) && notSimpleCase(x))
    {
        const uint32_t y = x + popcnt(x);
        bit = getMSBmask(y & ~x);
        const uint32_t mask = (bit << 1) - 1;
        const uint32_t z = (x & mask) + popcnt(x & ~mask);

        if (z == mask && (x & (bit << 1)))
            return node + 1;

        x = z;
    }

    if (notSimpleCase(x))
        return node + 1;
    else
        return node + 1 + (((x+1) & x)? 0: x);
}

If we need to find parent only for a leaf node, this algorithm and code may be simplified. (Ideone).

uint32_t parent(const uint32_t node)
{
    uint32_t x = node;

    while (x > 2)
    {
        const uint32_t y = x + popcnt(x);
        const uint32_t bit = getMSBmask(y & ~x);
        const uint32_t mask = (bit << 1) - 1;
        x = (x & mask) + popcnt(x & ~mask);
    }

    return node + 1 + (x == 1);
}

OTHER TIPS

function getParent(node, size)
{
    var rank = size, index = size;

    while (rank > 0) {
        var leftIndex = index - (rank + 1)/2;
        var rightIndex = index - 1;

        if (node == leftIndex || node == rightIndex) {
            return index;
        }

        index = (node < leftIndex ? leftIndex : rightIndex);
        rank  = (rank - 1)/2;
    }
}

It starts from the root, deciding which branch to step down into, and repeats until the node is found. The rank is the index of the leftmost node on the same level: 1, 3, 7, 15, ..., k^2 - k + 1.

The input parameters are:

  • node – The index of the node you want the parent of. (1-based)
  • size – The index of the root node; 15 in your example.

Example:

>>> r = []; for (var i = 1; i <= 15; i++) r.push(parent(i,15)); r;
[3, 3, 7, 6, 6, 7, 15, 10, 10, 14, 13, 13, 14, 15, undefined]

Let's take a look at your tree:

                     15
             7               14
         3       6       10      13
       1   2   4   5    8  9   11  12

Rewrite label n as 15-n. Then we get:

                     0
             8               1
         12      9        5       2
       14  13  11  10   7   6   4   3 

which could also be written as

                     0
             +8              +1
         +4      +1      +4      +1
       +2  +1  +2  +1  +2  +1  +2   +1 

Well there's a pattern for you. So in this labelling scheme, left children are 2^(i+1) more than their parents, where i is the height of the child, while right children are 1 more than their parents. How can we work out the height of a child though, and whether it's a left or right child?

Unfortunately, I can't work out any way to get this information without working out the entire path to the node, which means logarithmic time. You can however deduce the path to the node directly from the node's label (demonstrated here for a height-3 tree):

  • Suppose we have a label n
  • If n == 0, it's the root.
  • Otherwise:
    • If n - 8 >= 0, it's in the left subtree of the root. Set n = n-8.
    • If n - 8 < 0, it's in the right subtree of the root. Set n = n-1.
  • If n == 0 now, it's the root of whatever subtree was discovered in the last step.
  • Otherwise:
    • If n - 4 >= 0, it's in the left subtree of whatever subtree was discovered in the last step. Set n = n-4
    • If n - 4 < 0, it's in the right subtree of whatever subtree was discovered in the last step. Set n = n-1.
  • Et cetera until you've got down to testing n-1 >= 0.

You could do all this using bit arithmetic and -1s and it'd be spectacularly fast in real-world terms (calculating this in a trillion-node tree would only take ~12 times longer than a 10 node tree (ignoring memory issues)) but it's still going to be logarithmic time.

Anyway, once you know the height of the label though and whether it's a left or right child, you can easily calculate the label of the parent using the relations I mentioned earlier.

If you're allowed to query for the IDs of the children of a node, you can do some useful things.

Trivial case 1: if x = size, it is the root.

Trivial case 2: if x is a leaf (query for child IDs to find out), try x + 1. If x + 1 is not a leaf (an other query for child IDs), x was the right child of x + 1. If x + 1 is a leaf, x is the left child of x + 2.

For internal nodes: the children of x are x - 1 (right child) and x - (1 << height(x) - 1) (left child, the right child is a perfect binary tree so it has 2h-1 nodes). So, using the difference between x and left_child(x), the height of x can be determined: height(x) =ctz(x - left_child(x)), but it's really the size of that subtree that's required so you'd take 1 << height anyway, so the ctz can be dropped.
So the parent of x is either x + 1 (iff right_child(x + 1) == x) or the parent of x is
x + (x - left_child(x)) * 2 (otherwise).

That's not as nice as just doing math on the ID, but assuming you're allowed to ask for the children of a node in constant time, this is a constant time algorithm.

import math
def answer(h,q=[]):
    ans=[]
    for x in q:
        if(True):
            curHeight=h;
            num=int(math.pow(2,h)-1)
            if(x==num):
                ans.append(-1)
            else:
                numBE=int(math.pow(2,curHeight)-2)
                numL=int(num-int(numBE/2)-1)
                numR=int(num-1)
                flag=0
                while(x!=numR and x!=numL and flag<10):
                    flag=flag+1
                    if(x>numL):
                        num=numR
                    else:
                        num=numL
                    curHeight=curHeight-1
                    numBE=int(math.pow(2,curHeight)-2)
                    numL=num-(numBE/2)-1
                    numR=num-1
                ans.append(num)
    return ans

Sorry for the non-answer answer, but I don't think it's possible to do it in less than O(log n), even allowing for constant-time arithmetic and bitwise logical ops.

Each bit in the node's index potentially has an effect on nearly every left/right/stop decision in the traversal from the node to the root, including the first one. Moreover, examining the indices of the nodes at each level of the tree, they are aperiodic (and not only in powers of 2). This means (I think) that a constant number of arithmetic ops is not sufficient to identify the level, or whether the node is a left or right child.

It's a fascinating problem, though, and I'd love to be proven wrong. I just scoured my copy of Hacker's Delight -- I had high hopes for some of the exotic number bases they play with, but nothing quite fit.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top