Question

I'm trying to implement an autocomplete feature from phrases that contain multiple words.

I want to be able to match only the beginning of words (edgeNGram?), but for every word searched.

For example if I search for "monitor", I should receive all phrases that have the word monitor in them, but if I search for "onitor", I should get no matches (from the dataset below). Also the search for "mon ap" should give me "APNEA MONITOR- SCHULTE Vital Signs Monitor" for example and "mon rrr" should in turn give no results.

So my question is how should I go about to implement it?

So in short: the matching phrases should contain words that start with the terms searched for.

Here is my mapping:

{
    "quicksearch2" : {
        "results" : {
            "properties" : {       
                "phrase" : {
                    "type" : "string",
                    "index_analyzer" : "quicksearch_index_analyzer",
                    "search_analyzer" : "quicksearch_search_analyzer"
                }        
            }
        }
    }
}

And here are my settings:

{
    "quicksearch2" : {
        "settings" : {
            "index.analysis.analyzer.quicksearch_index_analyzer.filter.4" : "left_ngram",
            "index.analysis.analyzer.quicksearch_search_analyzer.filter.3" : "unique",
            "index.analysis.analyzer.quicksearch_index_analyzer.filter.3" : "unique",
            "index.analysis.filter.left_ngram.max_gram" : "20",
            "index.analysis.analyzer.quicksearch_search_analyzer.filter.2" : "asciifolding",
            "index.analysis.analyzer.quicksearch_search_analyzer.tokenizer" : "keyword",
            "index.analysis.analyzer.quicksearch_search_analyzer.filter.1" : "lowercase",
            "index.number_of_replicas" : "0",
            "index.analysis.analyzer.quicksearch_search_analyzer.filter.0" : "trim",
            "index.analysis.filter.left_ngram.type" : "edgeNGram",
            "index.analysis.analyzer.quicksearch_search_analyzer.type" : "custom",
            "index.analysis.analyzer.quicksearch_index_analyzer.filter.0" : "trim",
            "index.analysis.analyzer.quicksearch_index_analyzer.filter.2" : "asciifolding",
            "index.analysis.analyzer.quicksearch_index_analyzer.filter.1" : "lowercase",
            "index.analysis.analyzer.quicksearch_index_analyzer.type" : "custom",
            "index.analysis.filter.left_ngram.side" : "front",
            "index.analysis.analyzer.quicksearch_index_analyzer.tokenizer" : "keyword",
            "index.number_of_shards" : "1",
            "index.version.created" : "900899",
            "index.uuid" : "Lb7vC-eHQB-u_Okm3ERLow"
        }
    }
}

Here is my query:

query: {
    match: {
        phrase: {
            query: term,
            operator: 'and'
        }
}

Some sample data:

{
    "took" : 133,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "failed" : 0
    },
    "hits" : {
        "total" : 6197,
        "max_score" : 1.491863,
        "hits" : [ {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "emCydgTfQwuKkl4sSZoosQ",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Apnea Monitor"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "AXCO5rUxRwC9SebXcQxXeQ",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Apnea Monitor, Neonatal"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "tjJq3klPTsmP8akOc18Htw",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Apnea Monitor, Recording"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "-FjKWxl9Rm6-byn-wlpoIw",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Cardiorespiratory Monitor"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "Q19k6V6VQ6ulZOLCfESQ6w",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Impedance Pneumograph Bedside Monitor"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "YLI1er3cRjSyGumWNVi0pg",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Impedance Pneumograph Monitor"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "n5j1SaXeS2W6NymaYAYD6A",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Neonatal Monitor"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "U7Q5XrrHRbKOIwfRWO6RTQ",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Pulmonary Function Monitor"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "aF_THiCKRIyzunCbBxJTEg",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "APNEA MONITOR- SCHULTE Vital Signs Monitor"
            }
        }, {
            "_index" : "quicksearch2",
            "_type" : "results",
            "_id" : "8BAjZfwMQjWmrkqCO7o6gg",
            "_score" : 1.491863,
            "fields" : {
                "phrase" : "P.P.M. - PORTABLE PRECISION MONITOR Gas Monitor, Atmospheric"
            }
        } ]
    }
}
Was it helpful?

Solution 2

Changing the tokenizers (both index and search) from keyword to standard seem to have done the trick.

OTHER TIPS

I'm not quite sure why what you're doing isn't working, but here is a method that seems to do what you are wanting.

I created an index with these settings:

curl -XPUT "http://localhost:9200/test_index " -d'
{
   "settings": {
      "analysis": {
         "filter": {
            "my_edge_ngram_filter": {
               "type": "edgeNGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit"
               ]
            }
         },
         "analyzer": {
            "my_ngram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "my_edge_ngram_filter"
               ]
            },
            "my_whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings": {
      "docs": {
         "properties": {
            "phrase": {
               "type": "string",
               "index_analyzer": "my_ngram_analyzer",
               "search_analyzer": "my_whitespace_analyzer"
            }
         }
      }
   }
}'

then added the docs you listed:

curl -XPOST "http://localhost:9200/test_index/_bulk " -d'
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "1" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Apnea Monitor" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "2" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Apnea Monitor, Neonatal" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "3" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Apnea Monitor, Recording" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "4" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Cardiorespiratory Monitor" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "5" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Impedance Pneumograph Bedside Monitor" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "6" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Impedance Pneumograph Monitor" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "7" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Neonatal Monitor" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "8" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Pulmonary Function Monitor" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "9" } }
{ "phrase" : "APNEA MONITOR- SCHULTE Vital Signs Monitor" }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : "10" } }
{ "phrase" : "P.P.M. - PORTABLE PRECISION MONITOR Gas Monitor, Atmospheric" }
'

And the following searches seem to return the results you are expecting:

curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
    "query": {
        "match": {
           "phrase" : {
               "query": "monitor",
               "operator": "and"
           }
        }
    }
}'

returns all the docs,

curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
    "query": {
        "match": {
           "phrase" : {
               "query": "onitor",
               "operator": "and"
           }
        }
    }
}'

doesn't return any, and

curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
    "query": {
        "match": {
           "phrase" : {
               "query": "mon ap",
               "operator": "and"
           }
        }
    }
}'

returns all but document "10".

Here is a runnable example you can play with (you will need ES installed and running at localhost:9200, or supply another endpoint): http://sense.qbox.io/gist/19fdcdb20c24436c64b7656c3b8002fe78667b12

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top