Question

Imagine a time demanding software which takes in a bunch of text files (100+ MB each), processes them and puts in a DB. I am trying to optimize it a bit by utilizing more cores (precisely 8 for this machine, a quad-core i7 with hyperthreading).

Consider the following piece of code:

    ExecutorService es = Executors.newCachedThreadPool(
            new ThreadFactory() {
                private final AtomicInteger threadNumber = new AtomicInteger(1);
                private final String namePrefix = "awesome-thread-";

                public Thread newThread(Runnable r) {
                    Thread t = new Thread(r, namePrefix + threadNumber.getAndIncrement());
                    if (t.isDaemon()) 
                        t.setDaemon(false);
                    return t;
        }
    });

    while((e = upp.getNextEntry()) != null){

        // start time-consuming process in a separate thread to speed up
        Future<Set<Fragment>> fut = es.submit(new FragmentTask(e.getSomeProperty()));       

        /* do other stuff #sequentially# with entry e
         * it may or may not take as long as previous step
         * depending on e 
         */

        Set<Fragment> set = fut.get(); 
        for(Fragment frag : set){
            // do stuff with frag
        }                       
    }

Here the FragmentTask holds a recursive algorithm which takes from a couple to several thousand milliseconds to execute, depending on e.

I had initially implemented the thread pool as a FixedThreadPool but when I visually inspect how the threads are doing (via JVisualVM) I realized that more often than not the threads were idle. I figured I'd try CachedThreadPool as an alternative, but it appears as the pool is then a single thread that runs pretty much at 100% throughout that entire while loop. A secondary thread for the pool is not created at any time during this process, and the other cores are pretty much idle as well. What's really interesting is that the "main" worker thread which executes the rest of the stuff in the while loop is "waiting" practically all the time.

This I find a bit peculiar, since I'd expect at least two thread should be able to run at higher efficiency, one running the FragmentTask and one running the rest of the stuff on the while loop, up-to fut.get().

Any ideas as to what could be going on behind the scenes? Is the code "too sequential" for a thread pool to be used?

Was it helpful?

Solution

The problem is not with the thread pool implementation. You try and get one Future at a time, so your program is essentially single threaded.

What you should do is create a Collection of your Callables and use:

final List<Future<Set<Fragment>>> results
    = executor.invokeAll(yourCollectionOfCallables);

Then loop over your results. The thread pool will do its best to start threads with new tasks when one task is complete; what is more you are guaranteed that all tasks have completed (successfully or not) when you have iterated over all the list.

OTHER TIPS

You are using futures in the wrong way for stuff to execute in parallel. You need to submit all tasks first and save their futures before calling get on any future. Calling get waits for the task to complete.

What you are doing now is submitting a task which executes on an own thread, then the main thread waits for the task to complete. Rinse and repeat.

You say that you expect two threads. That is indeed what you have - the main thread and one executor thread.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top