Your results look correct. Lucene doesn't respect the order in which documents are added,it orders results by relevance score. "This is line 1 with some custom data" matches 8 terms in your query, while "This is line 4" has 4 matched terms, thus the former has a higher score. There is more complexity to scoring, including some factors that would equalize these somewhat, but in this case I believe that is the dominant behavior.
You can sort your results by passing in a Sort
to your search call. Sort.INDEXORDER
is easy and sorts by Doc ID, which would serve your purpose somewhat.
However, Doc IDs are not guaranteed to be in insert order. The more correct way to do this would be to add a field to sort on, which indexes either the time or order in which documents are indexed, and create a Sort
on that field.