Implementing an iterator over a binary search tree

https://stackoverflow.com/questions/4581576

14-10-2019
|

Question

I've been coding up a bunch of different binary search tree implementations recently (AVL, splay, treap) and am curious if there's a particularly "good" way to write an iterator to traverse these structures. The solution I've used right now is to have each node in the BST store pointers to the next and previous elements in the tree, which reduces iteration to a standard linked-list iteration. However, I'm not really satisfied with this answer. It increases the space usage of each node by two pointers (next and previous), and in some sense it's just cheating.

I know of a way of building a binary search tree iterator that uses O(h) auxiliary storage space (where h is the height of the tree) by using a stack to keep track of the frontier nodes to explore later on, but I've resisted coding this up because of the memory usage. I was hoping there is some way to build an iterator that uses only constant space.

My question is this - is there a way to design an iterator over a binary search tree with the following properties?

Elements are visited in ascending order (i.e. an inorder traversal)
next() and hasNext() queries run in O(1) time.
Memory usage is O(1)

To make it easier, it's fine if you assume that the tree structure isn't changing shape during the iteration (i.e. no insertions, deletions, or rotations), but it would be really cool if there was a solution that could indeed handle this.

Solution

The simplest possible iterator stores the last seen key, and then on the next iteration, searches the tree for the least upper bound for that key. Iteration is O(log n). This has the advantage of being very simple. If keys are small then the iterators are also small. of course it has the disadvantage of being a relatively slow way of iterating through the tree. It also won't work for non-unique sequences.

Some trees use exactly the implementation you already use, because it's important for their specific use that scanning is very fast. If the number of keys in each node is large, then the penalty of storing sibling pointers isn't too onerous. Most B-Trees use this method.

many search tree implementations keep a parent pointer on each node to simplify other operations. If you have that, then you can use a simple pointer to the last seen node as your iterator's state. at each iteration, you look for the next child in the last seen node's parent. if there are no more siblings, then you go up one more level.

If none of these techniques suit you, you can use a stack of nodes, stored in the iterator. This serves a the same function as the function call stack when iterating through the search tree as normal, but instead of looping through siblings and recursing on children, you push children onto the stack and return each successive sibling.

OTHER TIPS

As TokenMacGuy mentioned you can use a stack stored in the iterator. Here's a quick tested implementation of this in Java:

/**
 * An iterator that iterates through a tree using in-order tree traversal
 * allowing a sorted sequence.
 *
 */
public class Iterator {

    private Stack<Node> stack = new Stack<>();
    private Node current;

    private Iterator(Node argRoot) {
        current = argRoot;
    }

    public Node next() {
        while (current != null) {
            stack.push(current);
            current = current.left;
        }

        current = stack.pop();
        Node node = current;
        current = current.right;

        return node;
    }

    public boolean hasNext() {
        return (!stack.isEmpty() || current != null);
    }

    public static Iterator iterator(Node root) {
        return new Iterator(root);
    }
}

Other variation would be to traverse the tree at construction time and save the traversal into a list. You can use the list iterator afterwards.

Ok, I know this is old, but I was asked this in an interview with Microsoft a while back and I decided to work on it a bit. I have tested this and it works quite well.

template <typename E>
class BSTIterator
{  
  BSTNode<E> * m_curNode;
  std::stack<BSTNode<E>*> m_recurseIter;

public:
    BSTIterator( BSTNode<E> * binTree )
    {       
        BSTNode<E>* root = binTree;

        while(root != NULL)
        {
            m_recurseIter.push(root);
            root = root->GetLeft();
        }

        if(m_recurseIter.size() > 0)
        {
            m_curNode = m_recurseIter.top();
            m_recurseIter.pop();
        }
        else
            m_curNode = NULL;
    }

    BSTNode<E> & operator*() { return *m_curNode; }

    bool operator==(const BSTIterator<E>& other)
    {
        return m_curNode == other.m_curNode;
    }

    bool operator!=(const BSTIterator<E>& other)
    {
        return !(*this == other);
    }

    BSTIterator<E> & operator++() 
    { 
        if(m_curNode->GetRight())
        {
            m_recurseIter.push(m_curNode->GetRight());

            if(m_curNode->GetRight()->GetLeft())
                m_recurseIter.push(m_curNode->GetRight()->GetLeft());
        }

        if( m_recurseIter.size() == 0)
        {
            m_curNode = NULL;
            return *this;
        }       

        m_curNode = m_recurseIter.top();
        m_recurseIter.pop();

        return *this;       
    }

    BSTIterator<E> operator++ ( int )
    {
        BSTIterator<E> cpy = *this;     

        if(m_curNode->GetRight())
        {
            m_recurseIter.push(m_curNode->GetRight());

            if(m_curNode->GetRight()->GetLeft())
                m_recurseIter.push(m_curNode->GetRight()->GetLeft());
        }

        if( m_recurseIter.size() == 0)
        {
            m_curNode = NULL;
            return *this;
        }       

        m_curNode = m_recurseIter.top();
        m_recurseIter.pop();

        return cpy;
    }

};

Tree traversal, from Wikipedia:

All sample implementations will require call stack space proportional to the height of the tree. In a poorly balanced tree, this can be quite considerable.

We can remove the stack requirement by maintaining parent pointers in each node, or by threading the tree. In the case of using threads, this will allow for greatly improved inorder traversal, although retrieving the parent node required for preorder and postorder traversal will be slower than a simple stack based algorithm.

In the article there is some pseudocode for iteration with O(1) state, which can be easily adapted to an iterator.

What about using a depth first search technique. The iterator object just must have a stack of the already visited nodes.

If you use stack, you only achieve "Extra memory usage O(h), h is the height of the tree". However, if you want to use only O(1) extra memory, you need to record the Here are the analysis: - If current node has right child: find min of right sub tree - It current node has no right child, you need to look for it from the root, and keep updating it's lowest ancestor, which is its lowest next node

public class Solution {
           //@param root: The root of binary tree.

           TreeNode current;
           TreeNode root;
           TreeNode rightMost;
           public Solution(TreeNode root) {

               if(root==null) return;
                this.root = root;
                current = findMin(root);
                rightMost = findMax(root);
           }

           //@return: True if there has next node, or false
           public boolean hasNext() {

           if(current!=null && rightMost!=null && current.val<=rightMost.val)    return true; 
        else return false;
           }
           //O(1) memory.
           public TreeNode next() {
                //1. if current has right child: find min of right sub tree
                TreeNode tep = current;
                current = updateNext();
                return tep;
            }
            public TreeNode updateNext(){
                if(!hasNext()) return null;
                 if(current.right!=null) return findMin(current.right);
                //2. current has no right child
                //if cur < root , go left; otherwise, go right

                    int curVal = current.val;
                    TreeNode post = null;
                    TreeNode tepRoot = root;
                    while(tepRoot!=null){
                      if(curVal<tepRoot.val){
                          post = tepRoot;
                          tepRoot = tepRoot.left;
                      }else if(curVal>tepRoot.val){
                          tepRoot = tepRoot.right;
                      }else {
                          current = post;
                          break;
                      }
                    }
                    return post;

            }

           public TreeNode findMin(TreeNode node){
               while(node.left!=null){
                   node = node.left;
               }
               return node;
           }

            public TreeNode findMax(TreeNode node){
               while(node.right!=null){
                   node = node.right;
               }
               return node;
           }
       }

Use O(1) space, which means we will not use O(h) stack.

To begin:

hasNext()? current.val <= endNode.val to check if the tree is fully traversed.
Find min via left-most: We can alwasy look for left-most to find next minimum value.
Once left-most min is checked (name it current). Next min will be 2 cases: If current.right != null, we can keep looking for current.right's left-most child, as next min. Or, we need to look backwards for parent. Use binary search tree to find current's parent node.

Note: when doing binary search for parent, make sure it satisfies parent.left = current.

Because:If parent.right == current, that parent must has been visited before. In binary search tree, we know that parent.val < parent.right.val. We need to skip this special case, since it leads to ifinite loop.

public class BSTIterator {
    public TreeNode root;
    public TreeNode current;
    public TreeNode endNode;
    //@param root: The root of binary tree.
    public BSTIterator(TreeNode root) {
        if (root == null) {
            return;
        }
        this.root = root;
        this.current = root;
        this.endNode = root;

        while (endNode != null && endNode.right != null) {
            endNode = endNode.right;
        }
        while (current != null && current.left != null) {
            current = current.left;
        }
    }

    //@return: True if there has next node, or false
    public boolean hasNext() {
        return current != null && current.val <= endNode.val;
    }

    //@return: return next node
    public TreeNode next() {
        TreeNode rst = current;
        //current node has right child
        if (current.right != null) {
            current = current.right;
            while (current.left != null) {
                current = current.left;
            }
        } else {//Current node does not have right child.
            current = findParent();
        }
        return rst;
    }

    //Find current's parent, where parent.left == current.
    public TreeNode findParent(){
        TreeNode node = root;
        TreeNode parent = null;
        int val = current.val;
        if (val == endNode.val) {
            return null;
        }
        while (node != null) {
            if (val < node.val) {
                parent = node;
                node = node.left;
            } else if (val > node.val) {
                node = node.right;
            } else {//node.val == current.val
                break;
            }
        }
        return parent;
    }
}

By definition, it is not possible for next() and hasNext() to run in O(1) time. When you are looking at a particular node in a BST, you have no idea the height and structure of the other nodes are, therefore you can not just "jump" to the correct next node.

However, the space complexity can be reduced to O(1) (except for the memory for the BST itself). Here is the way I would do it in C:

struct node{
    int value;
    struct node *left, *right, *parent;
    int visited;
};

struct node* iter_next(struct node* node){
    struct node* rightResult = NULL;

    if(node==NULL)
        return NULL;

    while(node->left && !(node->left->visited))
        node = node->left;

    if(!(node->visited))
        return node;

    //move right
    rightResult = iter_next(node->right);

    if(rightResult)
        return rightResult;

    while(node && node->visited)
        node = node->parent;

    return node;
}

The trick is to have both a parent link, and a visited flag for each node. In my opinion, we can argue that this is not additional space usage, it is simply part of the node structure. And obviously, iter_next() must be called without the state of the tree structure changing (of course), but also that the "visited" flags do not change values.

Here is the tester function that calls iter_next() and prints the value each time for this tree:

                  27
               /      \
              20      62
             /  \    /  \
            15  25  40  71
             \  /
             16 21

int main(){

    //right root subtree
    struct node node40 = {40, NULL, NULL, NULL, 0};
    struct node node71 = {71, NULL, NULL, NULL, 0};
    struct node node62 = {62, &node40, &node71, NULL, 0};

    //left root subtree
    struct node node16 = {16, NULL, NULL, NULL, 0};
    struct node node21 = {21, NULL, NULL, NULL, 0};
    struct node node15 = {15, NULL, &node16, NULL, 0};
    struct node node25 = {25, &node21, NULL, NULL, 0};
    struct node node20 = {20, &node15, &node25, NULL, 0};

    //root
    struct node node27 = {27, &node20, &node62, NULL, 0};

    //set parents
    node16.parent = &node15;
    node21.parent = &node25;
    node15.parent = &node20;
    node25.parent = &node20;
    node20.parent = &node27;
    node40.parent = &node62;
    node71.parent = &node62;
    node62.parent = &node27;

    struct node *iter_node = &node27;

    while((iter_node = iter_next(iter_node)) != NULL){
        printf("%d ", iter_node->value);
        iter_node->visited = 1;
    }
    printf("\n");
    return 1;
}

Which will print the values in sorted order:

15 16 20 21 25 27 40 62 71

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow