Вопрос

I can't seem to figure out how to get elasticsearch (accessed via pyes) to search plural/singular terms. For instance, when I enter Monkies, I'd like to get results back that have Belt. I've looked at Elasticsearch not returning singular/plural matches but can't seem to make sense of it. Here's some curl statements

curl -XDELETE localhost:9200/myindex

curl -XPOST localhost:9200/myindex -d '
{"index": 
  { "number_of_shards": 1,
    "analysis": {
       "filter": {
                "myfilter": {
                    "type" : "porter_stem",
                    "language" : "English"
                }
                 },
       "analyzer": {
             "default" : {                    
                 "tokenizer" : "nGram",
                 "filter" : ["lowercase", "myfilter"]
              },
             "index_analyzer" : {                    
                 "tokenizer" : "nGram",
                 "filter" : ["lowercase", "myfilter"]
              },
              "search_analyzer" : {                                                    
                  "tokenizer" : "nGram",
                  "filter" : ["lowercase", "myfilter"]
              }
        }
     }
  }
}
}'

curl -XPUT localhost:9200/myindex/mytype/_mapping -d '{
    "tweet" : {
        "date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
        "properties" : {
            "user": {"type":"string"},
            "post_date": {"type": "date"},
            "message" : {"type" : "string", "analyzer": "search_analyzer"}
        }
    }}'

curl -XPUT 'http://localhost:9200/myindex/mytype/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "belt knife is a cool thing"
}'

curl -XPUT 'http://localhost:9200/myindex/mytype/2' -d '{
"user" : "alwild",
"post_date" : "2009-11-15T14:12:12",
"message" : "second message with nothing else"
}'

curl -XGET localhost:9200/myindex/mytype/_search?q=message:belts

I've got it to the point where searching for belts give me some results...but now it gives too many results. What do I have to do to get it to return only that one entry that has "belt" in it?

Это было полезно?

Решение

By default, your query is executed against the _all field, which uses the standard analyzer, and thus you have no stemming. Try searching with a query such as name:Monkies. For production purposes, use the match query, which will correctly connect analyzers between your query and the field mapping.

Elasticsearch makes it very easy to compare different analysis settings, by the way. Compare:

http://localhost:9200/_analyze?text=Monkies&analyzer=standard

vs

http://localhost:9200/_analyze?text=Monkies&analyzer=snowball

Другие советы

Can you reduce this to a few curl invocations that creates your index with this mapping, indexes some data, and performs a search that shows the results you weren't expecting?

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top