Elasticsearch: How to prevent the increase of score when search term appears multiple times in document?

StackOverflow https://stackoverflow.com/questions/22999414

  •  01-07-2023
  •  | 
  •  

Question

When a search term appears not only once but several times in the document I'm searching the score goes up. While this might be wanted most of the times, it is not in my case.

The query:

"query": {
  "bool": {
    "should": {
      "nested": {
        "path": "editions",
        "query": {
          "match": {
            "title_author": {
              "query": "look me up",
              "operator": "and",
              "boost": 2
            }
          }
        }
      }
    },
    "must": {
      "nested": {
        "path": "editions",
        "query": {
          "match": {
            "title_author": {
              "query": "look me up",
              "operator": "and",
              "fuzziness": 0.5,
              "boost": 1
            }
          }
        }
      }
    }
  }
}

doc_1

{
  "editions": [
    {
      "editionid": 1,
      "title_author": "look me up look me up",
    },
    {
      "editionid": 2,
      "title_author": "something else",
    }
  ]
}

and doc_2

{
  "editions": [
    {
      "editionid": 3,
      "title_author": "look me up",
    },
    {
      "editionid": 4,
      "title_author": "something else",
    }
  ]
}

Now, doc_1 would have a higher score due to the fact that the search terms are included twice. I don't want that. How do I turn this behavior off? I want the same score - no matter if the search term was found once or twice in the matching document.

Was it helpful?

Solution

In addition to what @keety and @Sid1199 talked about there is another way to do that: special property for fields with type "text" called index_options. By default it is set to "positions", but you can explicitly set it to "docs", so term frequencies will not be placed in the index and Elasticsearch will not know about repetitions while searching.

"title_author": {
    "type": "text",
    "index_options": "docs"
}

OTHER TIPS

There is a property in Elastic search known as "similarity". There are a lot of types of similarities, but the one that is useful here is "boolean". If you set similarity to "boolean" in your mapping, it will prevent multiple boosting of your query.

"title_author":{"type":"text","similarity":"boolean"}

If you run your query on this mapping, it will boost only once regardless of the number of time the word appears. You can read up more on similarities here

This is only available in ES versions 5.4 and above

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top