Java fixed size Thread Pool and optimal usage of all CPU cores

Question 1

I had to move away from ForkJoinPool, it did not use the threads optimally. While it worked fine for the Loop and Split nodes, it did no longer work if I want to parallelize the leaf nodes where the actual work happens. When I add those as RecursiveTask, then most threads are idle. The join() call does not steal work in the leaves for some reason (jdk1.7.0_45). It is waiting. In my case all the work is in the leaves, so using a custom RecursiveTask subclass for the leaves is worse than just using it for Loop and Split nodes (because it is waiting after part of the work while otherwise it waits after all the work). I don't think I use ForkJoinPool wrong, if you google you find people with similar problems.

I now did a simple solution: 2 thread pools, 1 fixed size for the actual hard work, and a cached one for all the Loop and Split nodes. I created FakeRecursiveTask (extend this instead of original) so I did not have to change the code (for Loop and Split). I use HardWork as base class for the leaves just that it is clear that it is something different, just call doHardWork(work).

With this solution all my worker threads are used fully all the time. Since the tree has limited size I should never run out of helper threads. Actually in my case it mostly uses the same number of helper threads as there are worker threads (so 8 in my case).

public class ThreadPool3 {
        private static int maxNumWorkerThreads;
        private static ExecutorService workerPool = null;
        private static ExecutorService helperPool = null;

        public static void initThreadPool(int maxNumWorkerThreads_) {
                int availProcessors = Runtime.getRuntime().availableProcessors();
                if (maxNumWorkerThreads_ <= 0) {
                        maxNumWorkerThreads_ = availProcessors;
                }
                maxNumWorkerThreads = maxNumWorkerThreads_;

                if (availProcessors != maxNumWorkerThreads) {
                        System.out.println("WARN: maxNumWorkerThreads (" + maxNumWorkerThreads + ") != availProcessors (" + availProcessors + ")");
                }
                workerPool = Executors.newFixedThreadPool(maxNumWorkerThreads);
                BlockingQueue<Runnable> workQueue = new SynchronousQueue<Runnable>();
                helperPool = new ThreadPoolExecutor(0, 4 * maxNumWorkerThreads, 60, TimeUnit.MINUTES, workQueue, Executors.defaultThreadFactory(),
        new ThreadPoolExecutor.CallerRunsPolicy());
        }


        public static abstract class HardWork implements Callable<Void> {
                @Override
                public abstract Void call() throws Exception;
        }

        public static void doHardWork(List<HardWork> tasks) throws Exception {
                workerPool.invokeAll(tasks);
        }


        /**
        * fake ForkJoinPoolInterface:
        *
        */
        public static abstract class FakeRecursiveTask<T> implements Callable<T> {
                private Future<T> resultFuture = null;

                /**
                * fake interface:
                */
                public abstract T compute();

                /**
                * fake interface:
                */
                public T invoke() {
                        return compute();
                }

                /**
                * fake interface:
                */
                public void fork() {
                        resultFuture = helperPool.submit(this);
                }

                /**
                * fake interface:
                */
                public T join() {
                        try {
                                return resultFuture.get();
                        }
                        catch (Exception e) {
                                throw new RuntimeException(e);
                        }
                }

                @Override
                public T call() throws Exception {
                        return compute();
                }
        }


        public static void shutdownThreadPool() {
                if (workerPool != null) {
                        workerPool.shutdown();
                }
                if (helperPool != null) {
                        helperPool.shutdown();
                }
        }
}

Question 2

You seem to have a parallel "divide and conquor" type problem where you are recursively splitting the problem into sub-problems to be "solved" using available cores.

You are correct that a niave implementation that creates threads is likely to use a lot of resources, and using a bounded thread pool will most likely deadlock.

The third alternative is the "fork/join" model implemented in Java 7. This is described in the Oracle Java tutorial (here), but I think that Dan Grossman's lecture notes do a better job of explaining it:

Beginner's Introduction to Java's ForkJoin Framework - Dan Grossman

Question 3

To avoid deadlocks completely, just do not use synchronous Future.get(). Use asynchronous methods CompletableFuture.then and CompletableFuture.both instead, available in Java8. These methods do not block, but submit new tasks when data are available. if you don't want to use Java8, look at Guava library, which (I believe) has equivalent facilities. Other asynchronous libraries exist, e.g. https://github.com/rfqu/df4j of mine. Its advantage is that task objects can be reused, so less number objects has to be created. If you provide more detailed description of your problem (say, in ordinary sequential form, or using infinite number of threads), I can help you to implement your program with df4j.

Java fixed size Thread Pool and optimal usage of all CPU cores

Approach 1, probably complete deadlock

Approach 2, the current one, at least somebody working

Approach 3, the only solution I see

Other solutions

[EDIT]

Accepted Solution and why I try something different (based on approach 2)

Note: