I would tackle this at the mapping level first. A Keyword tokenizer will make your entire field a single token, and then adding a Lowercase filter will lowercase everything...making the field case-insensitive:
"analysis":{
"analyzer":{
"analyzer_firstletter":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
After inserting some data, this is what the index holds:
$ curl -XGET localhost:9200/test2/tweet/_search -d '{
"query": {
"match_all" :{}
}
}' | grep title
"title" : "river dog"
"title" : "data"
"title" : "drive"
"title" : "drunk"
"title" : "dzone"
Note the entry "river dog", which is what you want to avoid matching. Now, if we use a match_phrase_prefix
query, you'll only match those that start with 'd':
$ curl -XGET localhost:9200/test2/tweet/_search -d '{
"query": {
"match_phrase_prefix": {
"title": {
"query": "d",
"max_expansions": 5
}
}
}
}' | grep title
"title" : "drive"
"title" : "drunk"
"title" : "dzone"
"title" : "data"
This isn't Elastica specific, but it should be fairly easy to translate over to the appropriate commands. The important part is the keyword
+ lowercase
analyzer, and then using a match_phrase_prefix
query.
As a sidenote, wildcards are super slow and best avoided where possible :)