Question

I've had this long term issue in not quite understanding how to implement a decent Lucene sort or ranking. Say I have a list of cities and their populations. If someone searches "new" or "london" I want the list of prefix matches ordered by population, and I have that working with a prefix search and an sort by field reversed, where there is a population field, IE New Mexico, New York; or London, Londonderry.

However I also always want the exact matching name to be at the top. So in the case of "London" the list should show "London, London, Londonderry" where the first London is in the UK and the second London is in Connecticut, even if Londonderry has a higher population than London CT.

Does anyone have a single query solution?

Was it helpful?

Solution

dlamblin,let me see if I get this correctly: You want to make a prefix-based query, and then sort the results by population, and maybe combine the sort order with preference for exact matches. I suggest you separate the search from the sort and use a CustomSorter for the sorting: Here's a blog entry describing a custom sorter. The classic Lucene book describes this well.

OTHER TIPS

API for

Sortcomparator

says

There is a distinct Comparable for each unique term in the field - if some documents have the same term in the field, the cache array will have entries which reference the same Comparable

You can apply a

FieldSortedHitQueue

to the sortcomparator which has a Comparator field for which the api says ...

Stores a comparator corresponding to each field being sorted by.

Thus the term can be sorted accordingly

My current solution is to create an exact searcher and a prefix searcher, both sorted by reverse population, and then copy out all my hits starting from the exact hits, moving to the prefix hits. It makes paging my results slightly more annoying than I think it should be.

Also I used a hash to eliminate duplicates but later changed the prefix searcher into a boolean query of a prefix search (MUST) with an exact search (MUST NOT), to have Lucene remove the duplicates. Though this seemed even more wasteful.

Edit: Moved to a comment (since the feature now exists): Yuval F Thank you for your blog post ... How would the sort comparator know that the name field "london" exactly matches the search term "london" if it cannot access the search term?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top