Question

How can I write an ElasticSearch term aggregation query that takes into account the entire field value, rather than individual tokens? For example, I would like to aggregate by city name, but the following returns new, york, san and francisco as individual buckets, not new york and san francisco as the buckets as expected.

curl -XPOST "http://localhost:9200/cities/_search" -d'
{
   "size": 0, 
   "aggs" : {
     "cities" : {
         "terms" : { 
            "field" : "city",
            "min_doc_count": 10
         }
     }
   }
}'
Was it helpful?

Solution

You should fix this in your mapping. Add a not_analyzed field. You can create the multi field if you also need the analyzed version.

"album": {
  "city": "string",
  "fields": {
    "raw": {
      "type": "string",
      "index": "not_analyzed"
    }
  }
}

Now create your aggregate on city.raw

OTHER TIPS

Update at 2018-02-11 now we can use syntax .keyword after grouped by field according to this

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

This elastic doc suggests to fix that in mapping (as suggested in the accepted answer) - either to make the field not_analyzed or to add a raw field with not_analyzed and use it in aggregations.

There is no other way for it. As the aggregations operate upon inverted index and if the field is analyzed, the inverted index is bound to have only tokens and not the original values of the field.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top