For a given binary tree find maximum binary search sub-tree

https://stackoverflow.com/questions/3163366

02-10-2019
|

Question

For a given binary tree, find the largest subtree which is also binary search tree?

Example:

Input:

                   10
               /         \
             50           150
            /  \         /   \
          25    75     200    20
         / \   / \    /  \    / \
        15 35 65 30  120 135 155 250

Output:

Solution

This answer previously contained an O(n log n) algorithm based on link/cut trees. Here is a simpler O(n) solution.

The core is a procedure that accepts a node, the unique maximum BSST rooted at its left child, the unique maximum BSST rooted at its right child, and pointers to the left-most and right-most elements of these BSSTs. It destroys its inputs (avoidable with persistent data structures) and constructs the unique maximum BSST rooted at the given node, together with its minimum and maximum elements. All BSST nodes are annotated with the number of descendants. As before, this procedure is called repeatedly from a post-order traversal. To recover the sub-tree, remember the root of the largest BSST; reconstructing it requires only a simple traversal.

I'll treat the left BSST only; the right is symmetric. If the root of the left BSST is greater than the new root, then the entire sub-tree is removed, and the new root is now left-most. Otherwise, the old left-most node is still left-most. Starting from the right-most node of the left BSST and moving upward, find the first node that is less than or equal to the root. Its right child must be removed; note now that due to the BST property, no other nodes need to go! Proceed to the root of the left BSST, updating the counts to reflect the deletion.

The reason this is O(n) is that in spite of the loop, each edge in the original tree is in essence traversed only once.

EDIT: collectively, the paths traversed are the maximal straight-line paths in a BST, except for the left spine and the right spine. For example, on the input

              H
             / \
            /   \
           /     \
          /       \
         /         \
        /           \
       /             \
      D               L
     / \             / \
    /   \           /   \
   /     \         /     \
  B       F       J       N
 / \     / \     / \     / \
A   C   E   G   I   K   M   O

here are the recursive calls on which each edge is traversed:

              H
             / \
            /   \
           /     \
          /       \
         /         \
        /           \
       /             \
      D               L
     / h             h \
    /   h           h   \
   /     h         h     \
  B       F       J       N
 / d     d h     h l     l \
A   C   E   G   I   K   M   O

OTHER TIPS

The previous algorithm (see revisions) was O(n^2) - we can generalize it to O(n log n) by noticing the facts that:

If b is the root of the largest BST and b.left.value < b.value, then b.left is also in the BST (same for b.right.value ≥ b.value)
If b is the root of the largest BST and a is also in the BST, then every node between a and b is in the BST.

So if c is between a and b, and c is not in the BST rooted by b, neither is a (due to (2.)). Using this fact, we can easily determine if a node is in the BST rooted by any given ancestor. We'll do this by passing a node into our function along with a list of its ancestors, and the associated min/maxValues that the current child-node would have to satisfy if indeed that ancestor were the root of the largest BST (we'll call this list ancestorList). We'll store the entire collection of potential-roots in overallRootsList

Let's define a structure called potentialRoot as follows:

Each potentialRoot contains the following values:
* node: The node we are considering for the root of the BST
* minValue and maxValue: the range another node must fall between to be part of the BST rooted by node (different for every node)
* subNodes: A list of the rest of the nodes in the largest BST rooted by node

The pseudo code looks like this (note that all lists mentioned are lists of potentialRoots):

FindLargestBST(node, ancestorList):
    leftList, rightList = empty lists
    for each potentialRoot in ancestorList:
        if potentialRoot.minValue < node.Value ≤ potentialRoot.maxValue:
            add node to potentialRoot.subNodes (due to (1.))
            (note that the following copies contain references, not copies, of subNodes)
            add copy of potentialRoot to leftList, setting maxValue = node.Value
            add copy of potentialRoot to rightList, setting minValue = node.Value

    add the potentialRoot (node, -∞, +∞) to leftList, rightList, and overallRootsList
    FindLargestBST(node.left, leftList)
    FindLargestBST(node.right, rightList)

At the end overallRootsList will be a list of n potentialRoots, each with a list of subNodes. The one with the largest subNodes list is your BST.

Since there are < treeHeight values in ancestorList, then (assuming the tree is balanced), the algorithm runs in O(n log n)

Interesting question!

My earlier attempt was moronically wrong!

Here is another attempt (hopefully correct this time).

I am assuming the tree is connected.

Suppose for each node n of the tree, you had a set of descendants of n, S_n with the property that

For each member x of S_n, the unique path from n to x is a Binary Search Tree (it is only a path, but you can still consider it a tree).
For every descendant y of x, such that the path from n to y is a BST, y is in S_n.

The set of nodes S_n, gives you the largest BST rooted at n.

We can construct S_n for each node by doing a depth first search on the tree, and passing in the path information (the path from root to the current node) and updating the sets of the nodes in the path by backtracking along the path.

When we visit a node, we walk up the path, and check if the BST property is satisfied for that segment of the path walked up so far. If so, we add the current node to the corresponding set of the node of the path we just walked to. We stop walking the path the moment the BST property is violated. Checking if the path segment we walked so far is a BST can be done in O(1) time, for a O(path_length) time total processing time, for each node.

At the end, each node will have its corresponding S_n populated. We can walk the tree now and pick the node with the largest value of S_n.

The time taken for this is the sum of depths of the nodes (in the worst case), and that is O(nlogn) in the average case (see Section 5.2.4 of http://www.toves.org/books/data/ch05-trees/index.html), but O(n^2) in the worst case.

Perhaps a cleverer way to update the sets will guarantee a reduction in the worst case time.

The pseudo-code could be something like:

static Tree void LargestBST(Tree t)
{
    LargestBST(t, new List<Pair>());
    // Walk the tree and return the largest subtree with max |S_n|.
}

static Tree LargestBST(Tree t, List<Pair> path)
{
    if (t == null) return;

    t.Set.Add(t.Value);

    int value = t.Value;
    int maxVal = value;
    int minVal = value;

    foreach (Pair p in path)
    {
        if (p.isRight)
        {
            if (minVal < p.node.Value)
            {
                break;
            }
        }

        if (!p.isRight)
        {
            if (maxVal > p.node.Value)
            {
                break;
            }
        }

        p.node.Set.Add(t.Value);

        if (p.node.Value <= minVal)
        {
            minVal = p.node.Value;
        }

        if (p.node.Value >= maxVal)
        {
            maxVal = p.node.Value;
        }
    }

    Pair pl = new Pair();
    pl.node = t;
    pl.isRight = false;

    path.Insert(0, pl);
    LargestBST(t.Left, path);

    path.RemoveAt(0);

    Pair pr = new Pair();
    pr.node = t;
    pr.isRight = true;

    path.Insert(0, pr);

    LargestBST(t.Right, path);

    path.RemoveAt(0);

}

LARGEST BINARY SEARCH TREE IN A BINARY TREE:

There are two ways we can approach this problem,

i)Largest BST not induced (From a node, all its children need not satisfy the BST condition)

ii)Largest BST induced (From a node , all its children will satisfy the BST condition)

We will discuss about the largest BST(Not Induced) here. We will follow bottom up approach(Post order traversal) to solve this.

a)Reach the leaf node

b)A tree node(from the leaf) will return a TreeNodeHelper object which has the following fields in it.

public static class TreeNodeHelper {
        TreeNode node;
        int nodes;
        Integer maxValue;
        Integer minValue;
        boolean isBST;


        public TreeNodeHelper() {}

        public TreeNodeHelper(TreeNode node, int nodes, Integer maxValue, Integer minValue, boolean isBST) {
            this.node = node;
            this.nodes = nodes;
            this.maxValue = maxValue;
            this.minValue = minValue;
            this.isBST = isBST;
        }      
    }

c)Initially from the leaf node, nodes=1,isBST=true,minValue=maxValue=node.data . And further, the nodes count will be increased if it satisfies the BST condition.

d)With the help of this, we will check the BST condition with current node. And we will repeat the same till root.

e)From each node two objects will be returned. one for last maximum BST and another one for current BST satisfying nodes. So from each node(above leaf) (2+2)=4 (2 for left subtree and 2 for right sub tree) objects will be compared and two will be returned.

f) The final maximum node object from root will be the largest BST

PROBLEM:

There is a problem in this approach. While following this approach, if a subtree is not satisfying the BST condition with the current node, we can't simply ignore the subtree(even it has less number of nodes). For example

From the leaf nodes(20,110) the objects will be tested with node(105), it satisfies the condition. But when it reaches node(95) the leaf node(20) does not satisfy the BST condition. Since this solution is for BST(Not induced) we should not ignore node(105) and node(110) which satisfies the condition. So from node(95) we have to backtrack again testing BST condition and catch those nodes(105, 110).

The complete code for this implementation is available in this link

https://github.com/dineshappavoo/Implementation/tree/master/LARGEST_BST_IN_BT_NOT_INDUCED_VER1.0

A binary search tree will give you a sorted result if you do a IN-ORDER Traversal. So, do an in-order traversal for the entire binary tree. The longest sorted sequence is your largest binary search sub tree.

Do a inorder traversal of elements (VISIT LEFT, VISIT ROOT, VISIT RIGHT)
While doing so, get the node data, compare whether the previous node data is lesser than the next data. If so, increment counter by 1. Store the start node.
When the comparison fails, store the end node and reset counter to 0
Store this information (counter,start,end) node in an array structure to later find which is having the maximum value and that will give you the longest binary search sub tree

GetLargestSortedBinarySubtree(thisNode, ref OverallBestTree)
    if thisNode == null
        Return null
    LeftLargest = GetLargestSortedBinarySubtree(thisNode.LeftNode, ref OverallBestTree)
    RightLargest = GetLargestSortedBinarySubtree(thisNode.RightNode, ref OverallBestTree)
    if LeftLargest.Max < thisNode.Value & RightLargest.Min > thisNode.Value
        currentBestTree = new BinaryTree(LeftLargest, thisNode.Value, RightLargest)
    else if LeftLargest.Max < thisNode.Value
        currentBestTree = new BinaryTree(LeftLargest, thisNode.Value, null)
    else if RightLargest.Min > thisNode.Value
        currentBestTree = new BinaryTree(null, thisNode.Value, RightLargest)
    else
        currentBestTree = new BinaryTree(null, thisNode.Value, null)
    if (currentBestTree.Size > OverallBestTree.Size)
        OverallBestTree = currentBestTree
    return currentBestTree

As BlueRaja pointed out, this algorithm is not correct.

It should really be called GetLargestSortedBinarySubtreeThatCanBeRecursivelyConstructedFromMaximalSortedSubtrees.

root(Tree L A R) = A

MaxBST(NULL) = (true, 0, NULL)
MaxBST(Tree L A R as T) = 
  let
    # Look at both children
    (L_is_BST, L_size, L_sub) = MaxBST(L)
    (R_is_BST, R_size, R_sub) = MaxBST(R)
  in
  # If they're both good, then this node might be good too
  if L_is_BST and R_is_BST and (L == NULL or root(L) < A) and (R == NULL or A < root(R))
  then (true, 1 + L_size + R_size, T)
  else
       # This node is no good, so give back the best our children had to offer
       (false, max(L_size, R_size), if L_size > R_size then L_sub else R_sub)

Looks at each tree node exactly once, so runs in O(N).

Edit: Crud, this doesn't consider that it can leave out some parts of a subtree. When I read subtree, I assumed "the entire tree rooted at some node". I may come back to fix this later.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow