Question

I'm pretty sure this has to do with stemming, and I'm not sure what I need to change to get spelling suggestions to return whole words.

Settings are:

ELASTICSEARCH_INDEX_SETTINGS = {
  'settings': {
    "analysis": {
        "analyzer": {
            "default": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": ["standard", "lowercase", "stop_words", "cm_snow"]
            },
            "ngram_analyzer": {
                "type": "custom",
                "tokenizer": "lowercase",
                "filter": ["haystack_ngram"]
            },
            "edgengram_analyzer": {
                "type": "custom",
                "tokenizer": "lowercase",
                "filter": ["haystack_edgengram"]
            }
        },
        "tokenizer": {
            "haystack_ngram_tokenizer": {
                "type": "nGram",
                "min_gram": 3,
                "max_gram": 15,
            },
            "haystack_edgengram_tokenizer": {
                "type": "edgeNGram",
                "min_gram": 2,
                "max_gram": 15,
                "side": "front"
            }
        },
        "filter": {
            "haystack_ngram": {
                "type": "nGram",
                "min_gram": 3,
                "max_gram": 15
            },
            "haystack_edgengram": {
                "type": "edgeNGram",
                "min_gram": 2,
                "max_gram": 15
            },
            "cm_snow": {
                "type": "snowball",
                "language": "English"
            },
            "stop_words": {
                "type": "stop",
                "ignore_case": True,
                "stopwords": STOP_WORDS
            }
        }
    }
  }
}

If I do the following query to Elasticsearch:

curl -XPOST 'localhost:9200/listing/_suggest' -d '{
  "my-suggestion" : {
    "text" : "table",
    "term" : {
      "field" : "text"
    }
  }
}'

I get back:

{"text":"tabl","offset":0,"length":5,"options":[]}

Why is the result "tabl", even for a correctly-spelled word?

Was it helpful?

Solution

The problem is that I was using the default analyzer, and the default analyzer was using snowball, which was using the snowball index_analyzer, so the words were getting indexes as their stems.

Because we still want to search on stemmed words, I added an extra field to my document call suggest that uses the standard analyzer. Into that, I put a text blob of a bunch of the words of that document (title, description, tags) and mark is as include_in_all=false Here's its mapping:

"suggest": {
    "type": "string",
    "analyzer": "standard"
},

and then in my query, I query against _all for the actual search results, but use suggest for the suggestions.

{
  "query": {
     "match": {
         "_all": "tabel"
     }
  },
  "suggest": {
    "suggest-0": {
      "term": {
        "field": "suggest",
        "size": 5
      },
      "text": "tabls"
    }
  }
}

Which gives:

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
    },
    "suggest": {
        "suggest-0": [
            {
                "text": "tabls",
                "offset": 0,
                "length": 5,
                "options": [
                    {
                        "text": "table",
                        "score": 0.8,
                        "freq": 858
                    },
                    {
                        "text": "tables",
                        "score": 0.8,
                        "freq": 682
                    },
                    {
                        "text": "tails",
                        "score": 0.8,
                        "freq": 4
                    },
                    {
                        "text": "tabs",
                        "score": 0.75,
                        "freq": 4
                    },
                    {
                        "text": "tools",
                        "score": 0.6,
                        "freq": 176
                    }
                ]
            }
        ]
    }
}

and then my UI code knows to present a suggestion to the user so they can make better searches.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top