Is there a way to have elasticsearch return a hit per generated bucket during an aggregation?

StackOverflow https://stackoverflow.com/questions/22369800

  •  13-06-2023
  •  | 
  •  

Frage

right now I have a query like this:

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
                    }
                },
                {
                    "range": {
                        "date": {
                            "from": "now-12h",
                            "to": "now"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "query": {
            "terms": [
                {
                    "field": "query",
                    "size": 3
                }
            ]
        }
    }
}

The aggregation works perfectly well, but I can't seem to find a way to control the hit data that is returned, I can use the size parameter at the top of the dsl, but the hits that are returned are not returned in the same order as the bucket so the bucket results do not line up with the hit results. Is there any way to correct this or do I have to issue 2 separate queries?

War es hilfreich?

Lösung

To expand on Filipe's answer, it seems like the top_hits aggregation is what you are looking for, e.g.

{
  "query": {
    ... snip ...
  },
  "aggs": {
    "query": {
      "terms": {
        "field": "query",
        "size": 3
      },
      "aggs": {
        "top": {
          "top_hits": {
            "size": 42
          }
        }
      }
    }
  }
}

Andere Tipps

Your query uses exact matches (match and range) and binary logic (must, bool) and thus should probably be converted to use filters instead:

"filtered": {
 "filter": {
    "bool": {
       "must": [
          {
             "term": {
                "uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
             }
          },
          {
             "range": {
                "date": {
                   "from": "now-12h",
                   "to": "now"
                }
             }
          }
       ]
    }
 }

As for the aggregations,

The hits that are returned do not represent all the buckets that were returned. so if have buckets for terms 'a', 'b', and 'c' I want to have hits that represent those buckets as well

Perhaps you are looking to control the scope of the buckets? You can make an aggregation bucket global so that it will not be influenced by the query or filter.

Keep in mind that Elasticsearch will not "group" hits in any way -- it is always a flat list ordered according to score and additional sorting options.

Aggregations can be organized in a nested structure and return computed or extracted values, in a specific order. In the case of terms aggregation, it is in descending count (highest number of hits first). The hits section of the response is never influenced by your choice of aggregations. Similarly, you cannot find hits in the aggregation sections.

If your goal is to group documents by a certain field, yes, you will need to run multiple queries in the current Elasticsearch release.

I'm not 100% sure, but I think there's no way to do that in the current version of Elasticsearch (1.2.x). The good news is that there will be when version 1.3.x gets released:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top