Question

My question is related to lucene. I am trying to get some specific documents from the index.

For Example :

I am reading a .txt file and creating an index of it. My each line is a new document. Now what i am trying to achieve here is that I want to select only few documents but in a single search query.

e.g

Input :-

Input lines where each line is a new document.
This is line 1 with some custom data
This is line 2 with random data
This is line 3 with unwanted data
This is line 4
This is line 1 with some custom data
This is line 2
This is line 3
This is line 4
This is line 1 with some custom data
This is line 2
This is line 3
This is line 4

Expected Output :-

This is line 1 with some custom data
This is line 4
This is line 1 with some custom data
This is line 4
This is line 1 with some custom data
This is line 4

But the output coming is like this : -

Found 6 hits.
1. This is line 1 with some custom data
2. This is line 1 with some custom data
3. This is line 1 with some custom data
4. This is line 4
5. This is line 4
6. This is line 4

Can anyone help me with the code snippet that how can I achieve the same in single search query. Or which Kind of query parser could be useful.

Appreciate your help .

Was it helpful?

Solution

Your results look correct. Lucene doesn't respect the order in which documents are added,it orders results by relevance score. "This is line 1 with some custom data" matches 8 terms in your query, while "This is line 4" has 4 matched terms, thus the former has a higher score. There is more complexity to scoring, including some factors that would equalize these somewhat, but in this case I believe that is the dominant behavior.

You can sort your results by passing in a Sort to your search call. Sort.INDEXORDER is easy and sorts by Doc ID, which would serve your purpose somewhat.

However, Doc IDs are not guaranteed to be in insert order. The more correct way to do this would be to add a field to sort on, which indexes either the time or order in which documents are indexed, and create a Sort on that field.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top