Question

In document import method, I work with large number of files. Each file size can also 100mb-200mb. I want to use threading in asynchronously. In for loop, each file is processed and then indexed(lucene). This operation is very cost and time useless in real time. Total operation must not stop.

General structure of import method is given below:

public void docImport()
{
  ExecutorService executor = Executors.newFixedThreadPool(5);
  for(final File file : fileList)
  {
    //Do some works...
    executor.execute(new Runnable() {
        @Override
        public void run() {
           zipFile(file); //Each zipped file has diff name and same directory.
           indexFile(file); //Each file is indexed same directory.
        }
    });
  }
  executor.shutdown();
}

General structure of indexFile method :

public void indexFile()
{
  ExecutorService executor = Executors.newFixedThreadPool(1);
  IndexWriter writer = null;
  Directory dir = .....;
  Analyzer analyzer = new StandardAnalyzer(LUCENE_VERSION);
  IndexWriterConfig iwc = new IndexWriterConfig(LUCENE_VERSION, analyzer);
  iwc.setRAMBufferSizeMB(200);   
  iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
  writer = new IndexWriter(dir, iwc);
  Document lucenedoc = new Document();
  lucenedoc.add(..);

  if (writer.getConfig().getOpenMode() == IndexWriterConfig.OpenMode.CREATE) {
    writer.addDocument(lucenedoc);
  } else {
    writer.updateDocument(new Term(PATH, innerPath), lucenedoc);
  }
  executor.shutdown();
} 

My question is :

while docImport method working, 5 threads read files and each of thread is trying to index files to same lucene index file. So error occured some intervals : "org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@C:\lucene\index\write.lock"

For example, sometimes 30 file is getting indexed in 100 files. Others is not indexed because of error.

How can I resolve this error? How can I handle this?

Was it helpful?

Solution

Your getting this error when you attempt to open an IndexWriter when there is already a writer open on the index.

In addition to that issue, opening a new IndexWriter is a very expensive operation. Even if you were to get it working (say synchronizing a block which opens, uses and then closes the IndexWriter), this would likely be quite slow.

Instead, open one IndexWriter, keep it open, and share it across each of the threads.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top