Question

Question 1: In Lucene's SpanNearQuery (or span_near in ElasticSearch), what is the exact meaning of slop? Is it the number of words separating the two matching words, or is it the separating number of words plus 1?

For example, suppose your indexed text is: foo bar biz

Which queries would match this text: "foo biz"~0, "foo biz"~1, "foo biz"~2

I would expect that the first wouldn't match and the last would. But what about the middle?

Question 2: Now a second and more complex corollary question: how is slop handled if there are more than two search clauses? Is it applied to each pair of clauses or any pair of clauses.

For example, suppose you construct a SpanNearQuery with three clauses: foo, bar, biz. What slop is needed to match the same indexed text above? I would expect a slop of 2 definitely would, but what about 0 or 1?

Similarly, with the same three clause query, what slop is needed to match the text: foo bar ble biz

Was it helpful?

Solution

Question 1: Slop is the number of words separating the span clauses. So slop 0 would mean they are adjacent. In the example I gave, slop of 1 would match.

Question 2: When there are more than two span near clauses, each clause must be connected to at least one other clause by no more than slop words separating them AND all of the clauses must be connected to each other through a chain. However, each clause need not be separated by slop words to every other clause.

For the first example in question 2: slop of 0, 1, and 2 would all match. Slop of zero matches even though foo and biz are separated by more than one because there is a chain through all clauses.

For the second example in question 2: slop of 0 would not match because biz is separated from all other clauses by more than 0 slop. Slop of 1 would match because foo and bar are separated by 0 slop, in addition bar and biz are separated by 1 slop. It matches even though foo and biz are separated by more than one because there is a chain through all clauses. Slop of 2 would obviously match.

OTHER TIPS

it's explained in Span near query

Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order. The span near query maps to Lucene SpanNearQuery.

Official document -https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-span-near-query.html

Example You want to match Mr. Bush and get details about them . Since there are two words which are not similar ,slop value is 2

Mr. Jeorge Willam Bush, Mr Sean Willam Bush, Mr James Kane Bush

Sample DSL request-

  GET school/_search
    {
     "query":{
       "match_phrase": {
         "EmpName":
         {
           "query": "Mr. Bush",
           "slop":2
         }
        
       }
     }
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top