I'm trying to increase performance of indexing my lucene files. For this, I created a worker "LuceneWorker" that does the job.

Given the code below, the 'concurrent' execution becomes significantly slow. I think I know why - it's because the futures grows to a limit that there's hardly memory to perform yet another task of the LuceneWorker.

Q: is there a way to limit the amount of 'workers' that goes into the executor? In other words if there are 'n' futures - do not continue and allow the documents to be indexed first?

My intuitive approach is that I should build a consumer/producer with ArrayBlockingQueue. But wonder if I'm right before I redesign it.

        ExecutorService executor = Executors.newFixedThreadPool(cores);
        List<Future<List<Document>>> futures = new ArrayList<Future<List<Document>>>(3);
        for (File file : files)
        {
            if (isFileIndexingOK(file))
            {
                System.out.println(file.getName());
                Future<List<Document>> future = executor.submit(new LuceneWorker(file, indexSearcher));
                futures.add(future);
            }
            else
            {
                System.out.println("NOT A VALID FILE FOR INDEXING: "+file.getName());
                continue;   
            }
        } 

        int index=0;
        for (Future<List<Document>> future : futures)
        {
            try{

                List<Document> docs = future.get();

                for(Document doc : docs)
                    writer.addDocument(doc);    


            }catch(Exception exp)
            {
                //exp code comes here.
            }
        }
有帮助吗?

解决方案

If you want to limit the number of waiting jobs, use a ThreadPoolExecutor with a bounded queue like ArrayBlockingQueue. Also roll your own RejectedExecutionHandler so that the submitting thread waits for capacity in the queue. You cannot use the convenience methods in Executors for that as newFixedThreadPool uses an unbounded LinkedBlockingQueue.

其他提示

Depending on the standard input size and the complexity of the LuceneWorker class, I could imagine solving this problem at least partially using the Fork/Join framework. When using JDK 8's CountedCompleter implementation (included in jsr166y) I/O operations would not produce any problems.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top