Question

I'm trying to increase performance of indexing my lucene files. For this, I created a worker "LuceneWorker" that does the job.

Given the code below, the 'concurrent' execution becomes significantly slow. I think I know why - it's because the futures grows to a limit that there's hardly memory to perform yet another task of the LuceneWorker.

Q: is there a way to limit the amount of 'workers' that goes into the executor? In other words if there are 'n' futures - do not continue and allow the documents to be indexed first?

My intuitive approach is that I should build a consumer/producer with ArrayBlockingQueue. But wonder if I'm right before I redesign it.

        ExecutorService executor = Executors.newFixedThreadPool(cores);
        List<Future<List<Document>>> futures = new ArrayList<Future<List<Document>>>(3);
        for (File file : files)
        {
            if (isFileIndexingOK(file))
            {
                System.out.println(file.getName());
                Future<List<Document>> future = executor.submit(new LuceneWorker(file, indexSearcher));
                futures.add(future);
            }
            else
            {
                System.out.println("NOT A VALID FILE FOR INDEXING: "+file.getName());
                continue;   
            }
        } 

        int index=0;
        for (Future<List<Document>> future : futures)
        {
            try{

                List<Document> docs = future.get();

                for(Document doc : docs)
                    writer.addDocument(doc);    


            }catch(Exception exp)
            {
                //exp code comes here.
            }
        }
Was it helpful?

Solution

If you want to limit the number of waiting jobs, use a ThreadPoolExecutor with a bounded queue like ArrayBlockingQueue. Also roll your own RejectedExecutionHandler so that the submitting thread waits for capacity in the queue. You cannot use the convenience methods in Executors for that as newFixedThreadPool uses an unbounded LinkedBlockingQueue.

OTHER TIPS

Depending on the standard input size and the complexity of the LuceneWorker class, I could imagine solving this problem at least partially using the Fork/Join framework. When using JDK 8's CountedCompleter implementation (included in jsr166y) I/O operations would not produce any problems.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top