Question

Is there a way I can tell Lucene to store documents in disk in some predefined order?

For instance, I have documents with sparse, but sorted, IDs (i.e. in a LongField named ID). I want to load them one by one -- sorted -- from Lucene Index. In a perfect world, this will happen magically just by iterating from 0 to IndexReader.maxDoc() and loading IndexReader.document(i).

Is this possible?

Was it helpful?

Solution

This is possible using a SortingMergePolicy that will sort your documents whenever a merge occurs.

Here's an example (gist).

First, you create a SortingMergePolicy:

    boolean ascending = false;
    SortField idSortField = new SortField("id", SortField.Type.LONG, ascending);

    SortingMergePolicy sortingMP = new SortingMergePolicy(
            iwc.getMergePolicy(), new Sort(idSortField));
    iwc.setMergePolicy(sortingMP);

Then, you index your documents regularly, in whatever order you like

    d = new Document();
    d.add(new LongField("id", 4, Field.Store.YES));
    iw.addDocument(d);

    d = new Document();
    d.add(new LongField("id", 2, Field.Store.YES));
    iw.addDocument(d);

You just need to force a merge before opening your IndexReader

    iw.forceMerge(1, true);

Now, if you open an IndexReader on this, you can iterate you documents from 0 to IndexReader.maxDoc() and they will be sorted by your LongField.

A few notes:

  • This class lives in lucene-misc, you might have to add this as an additional dependency
  • The API changed with the 4.8.0 release, requiring a Sort instead if a Sorter
  • You can also do live sorting or offline sorting (example in the gist)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top