Question

Greetings Overflowers,

  • The data structure is an acyclic tree of arbitrary number of nodes.
  • Shallower nodes are dependent on the results of deeper nodes.
  • The final result can simply be calculated by traversing the tree recursively.
  • If I had unlimited threads, I would assign each node one thread or even more.
  • Threads assigned to shallower nodes would wait for those of deeper nodes to finish.
  • However, I only have limited threads; sometimes more, sometimes less than total nodes.

Any thoughts on how to traverse such trees and eventually get the final results with limited threads ?

Regards

Was it helpful?

Solution

You want to look at Topological Ordering of directed acyclic graphs (DAGs).

A topological ordering of a DAG gives you the order in which things must be done if the nodes in the graph represent "jobs". The Wikipedia page has the algorithm to arrive at the order.

Once you have the order, your worker threads would just start consuming "jobs" (elements) from this ordering. Before starting the job, the thread would need to check to see if the dependent jobs are finished, but they should be either finished, or in progress by another thread.

Since you have a tree structure, you could special case this a bit: simply put the child nodes first, then their parents, etc., doing each level of the tree after another.

Also, throwing an "unlimited" number of threads at the problem raises an eyebrow... Unless your jobs are typically I/O bound, it would seem that (# of CPUs + some constant) threads would be appropriate.

OTHER TIPS

  1. You should always consider using thread pool, but not unlimited numer of threads.
  2. The class library provides a flexible thread pool implementation along with some useful predefined configurations. You can create a thread pool by calling one of the static factory methods in Executors. I think for your case the following method should be used:

    newFixedThreadPool - creates a fixed-size thread pool creates threads as tasks are submitted, up to the maximum pool size, and then attempts to keep the pool size constant (adding new threads if a thread dies due to an unexpected Exception).

    During thread pool creation you set pool size, but you may add as many threads as you wish to the executor (thread per your node for example). Threads that will not be executed, will be queued.

  3. If you have a batch of computations to submit to an Executor and you want to retrieve their results as they become available, you may use a completion service. CompletionService combines the functionality of an Executor and a BlockingQueue. You can submit Callable tasks to it for execution and use the queue like methods take and poll to retrieve completed results, packaged as Futures, as they become available. ExecutorCompletionService implements CompletionService, delegating the computation to an Executor.

    Here is an example of using CompletionService from Java in Concurrency book:

    public class Renderer {
        private final ExecutorService executor;
    
    
    
    Renderer(ExecutorService executor) { this.executor = executor; }
    
    
    void renderPage(CharSequence source) {
        final List<ImageInfo> info = scanForImageInfo(source);
        CompletionService<ImageData> completionService =
            new ExecutorCompletionService<ImageData>(executor);
        for (final ImageInfo imageInfo : info)
            completionService.submit(new Callable<ImageData>() {
                 public ImageData call() {
                     return imageInfo.downloadImage();
                 }
            });
    
    
        renderText(source);
    
    
        try {
            for (int t = 0, n =  info.size(); t < n;  t++) {
                Future<ImageData> f = completionService.take();
                ImageData imageData = f.get();
                renderImage(imageData);
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        } catch (ExecutionException e) {
            throw launderThrowable(e.getCause());
        }
    }
    
    }

For CPU intensive tasks, the optimal number of threads is likely to be the number of cores or logical threads you have. This could be 4, 6 or 8 on your machine. A simple way to create a thread pool which matches the number of physical threads available is.

ExecutorService pool = Executors.newFixedThreadPool(
                       Runtime.getRuntime().availableProcessors());

If you have more threads than cores, your threads will have to context switch adding overhead and making them slower.

This looks like a good fit for the Fork/Join framework being developed for Java 7; it's also available for Java 6 at http://gee.cs.oswego.edu/dl/concurrency-interest/ (under jsr166y). You can use the RecursiveTask class to represent your computation and fork additional tasks for child nodes in your data structure. There's a short tutorial at http://download.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top