Question

I have a function to traverse a binary tree recursively. As the operations are compute intensive I thought of spawning multiple threads in the recursive function using tasks as follows:

static void Traverse<T>(Tree<T> node, Action<T> action) 
{ 
 if (node == null) return; 
 var t1 = Task.Factory.StartNew(() => action(node.Data)); 
 var t2 = Task.Factory.StartNew(() => Traverse(node.Left, action)); 
 var t3 = Task.Factory.StartNew(() => Traverse(node.Right, action)); 
 Task.WaitAll(t1, t2, t3); 
} 

Now this does seem to work. However I want to know if there is anything I need to be careful about while using tasks in recursive fashion. E.g if the depth of the tree is long could it somehow not able to create tasks for lower levels and wait for other tasks to finish (which would never finish as they are waiting for lower level tasks to finish)?

Was it helpful?

Solution

If the tree is very large spawning that many tasks could cause problems completely exhausting the entire thread pool causing performance problems elsewhere, this is because there is no dependency between the nodes and their parents so all will attempt to run concurrently. What I would do is have your Tree<T> class implement IEnumerable<T> that will return it's own Data property and all it's children's Data properties then use Parallel.ForEach

static void Traverse<T>(Tree<T> node, Action<T> action) 
{
    Parallel.ForEach(node, action);
}


//Elsewhere
class Tree<T> : IEnumerable<T>
{
    Tree<T> Left { get; set; }
    Tree<T> Right { get; set; } 
    T Data { get; set; }

    public IEnumerator<T> GetEnumerator()
    {
        yield return this.Data;

        if (Left != null)
        {
            foreach (var left in Left)
            {
                yield return left.Data;
            }
        }

        if (Right != null)
        {
            foreach (var right in Right)
            {
                yield return right.Data;
            }
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

The only "Gotcha" you need to concern yourself with is if there are any closed loops in the tree where a child could be the parent of a higher level node which would cause infinite recursion.


EDIT: Here is a new version that does not use recursion on the GetEnumerator and instead uses a Stack<Tree<T>> object to hold the state so if you had extremely tall trees you can't have a StackOverflowException. Also if you remove the comments from the commented lines it will stop the "infinite recursion" problem that the previous version had. But if you know you won't have any looping structures it is not necessary, so I left it commented out.

class Tree<T> : IEnumerable<T>
{
    Tree<T> Left { get; set; }
    Tree<T> Right { get; set; }
    T Data { get; set; }

    public IEnumerator<T> GetEnumerator()
    {
        Stack<Tree<T>> items = new Stack<Tree<T>>();
        //HashSet<Tree<T>> recursiveCheck = new HashSet<Tree<T>>();

        items.Push(this);
        //recursiveCheck.Add(this);

        while (items.Count > 0)
        {
            var current = items.Pop();

            yield return current.Data;

            if (current.Left != null)
                //if(recursiveCheck.Add(current.Left))
                    items.Push(current.Left);
            if (current.Right != null)
                //if (recursiveCheck.Add(current.Right))
                    items.Push(current.Right);
        }

    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

OTHER TIPS

Like you said, spawning threads recursively doesn't seem to be a good idea and if your tree is long enough you'll end up with either lots of thread which will be slower because there will be lots of overheads, or you'll eventually hit the limit of parallel threads in your program. So I suggest you use a ThreadPool instead to manage your threads.

You may have one thread to navigate the tree, and another two threads to do the heavy work. You should also note that using thread won't be good unless you have some blocking operations like I/O read/write or some networking going on. If you don't, it's probably best to use only one thread for your heavy work and one for traversing the tree.

I do not think that it would stop working at any point, but the use of multiple threads can increase CPU usage as the computer is doing more operations at once so it may be safer, yet slower, to not use multiple threads and just use the following:

static void Traverse<T>(Tree<T> node, Action<T> action)
{
 if (node == null) return;
 action(node);
 Traverse(node.Left, action);
 Traverse(node.Right, action);
}

This will be slower though so if you are worried about how fast it runs you will want to use your original version.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top