Domanda

i'm receiving requests/events from a large number of client applications. i'd like to use elasticsearch to find out when my highest traffic point is.

one thing i've tried is a filter aggregation with a nested histogram and then a nested "terms" aggregation that gets the distinct hour of the day via a script field. the following is my attempt, and it performs terribly (as I'd expect since I'm executing a script per document).

{
  "aggs": {
    "sites_within_range": {
      "filter" : { 
        "range" : { 
          "occurred" : { 
            "gt" : "now-1M"
          }
        } 
      },

      "aggs": {
        "sites_over_time": {
          "date_histogram": {
            "field": "occurred",
            "interval": "week"
          },
          "aggs":{
            "site_names": {
              "terms": {
                "script": "doc['occurred'].date.getHourOfDay()",
                "size": 10000
              }
            }
          }
        }
      }

    }
  }
}

I've also considered storing the date elements i want to query as distinct parts of the document, eg:

{
    "date": "actual datetime",
    "day": "monday",
    "hour": 8
    "minute": 37
}

this also smells like the wrong answer to me.


<edit> after some investigation, looks like I might be interested in the new cardinality / percents aggregations coming in 1.1?

È stato utile?

Soluzione

The same kind of problem has been solved in this thread.

Adapting the solution to your problem, we need to make a script to convert the date into the hour of day:

Date date = new Date(doc['created_at'].value) ; 
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('HH');
format.format(date)

And use it in a query:

{
    "aggs": {
        "perWeekDay": {
            "filter" : { 
                "range" : { 
                    "occurred" : { 
                        "gt" : "now-1M"
                    }
                } 
            },
            "aggs": {
                "terms": {
                    "script": "Date date = new Date(doc['created_at'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('HH');format.format(date)"
            }
        }
    }
}

And you have the traffic by hour of day.

Nota bene: Storing the hours/days/minutes in your document is the most efficient way of doing that kind of aggregation. My answer assumes you don't want to store that information. Scripts usually aren't über efficent.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top