Question

I need to group by _id and countries. I've managed to group by _id, but would like to know how to group the countries in these _ids and return the count for each country.

I am using the aggregation framework. So far so good.

conn = Mongo::Connection.new
db   = conn['foobar_development']

cmd = {
  aggregate: 'live_daily_stats',
  pipeline: [
    { '$project' => {
      :metacontent => 1,
      :visits => 1,
    } },
    { '$unwind' => '$visits' },
    { '$match' => { 'visits.minute' => { '$gt' => 224 } } },
    { '$sort' => { 'visits.minute' => 1 } },
    { '$group' => { 
      :_id => '$_id', 
      :visits => { '$push' => '$visits' }, 
      :visits_count => { '$sum' => 1 },
      :metacontent => { '$addToSet' => '$metacontent' },
      } 
    },
    { '$sort' => { 'visits_count' => -1 } },
  ]
}

res = db.command(cmd)['result']

The following returns:

[
    [0] {
                 "_id" => "20120726/foobar/song/custom-cred",
              "visits" => [
            [0] {
                                              "country_name" => "UK",
                               "iso_two_letter_country_code" => "UK",
                                                   "referer" => "http://localhost:3000/",
                                                    "minute" => 59,
                                                  "token_id" => "134326199711wfryhpdq"
            },
            [1] {
                                              "country_name" => "UK",
                               "iso_two_letter_country_code" => "UK",
                                                   "referer" => "http://localhost:3000/",
                                                    "minute" => 59,
                                                  "token_id" => "134326199711wfryhpdq"
            },
            [2] {
                                              "country_name" => "US",
                               "iso_two_letter_country_code" => "US",
                                                   "referer" => "http://localhost:3000/",
                                                    "minute" => 59,
                                                  "token_id" => "134326199711wfryhpdq"
            }
        ],
        "visits_count" => 1,
         "metacontent" => [
            [0] {
                                     "date" => "20120726"
            }
        ]
    },
    [1] {
                 "_id" => "20120725/foobar/song/test-pg3-long-title-here-test-lorem-ipsum-dolor-lo",
              "visits" => [
            [0] {
                                              "country_name" => "UK",
                               "iso_two_letter_country_code" => "UK",
                                                   "referer" => "http://localhost:3000/",
                                                    "minute" => 58,
                                                  "token_id" => "13432600883knjzcbic"
            }
        ],
        "visits_count" => 1,
         "metacontent" => [
            [0] {
                                     "date" => "20120725"
            }
        ]
    }
]
Was it helpful?

Solution

I changed the $group to concatenate both _id and country_name:

cmd = {
  aggregate: 'live_daily_stats',
  pipeline: [
    { '$project' => {
      :metacontent => 1,
      :visits => 1,
    } },
    { '$unwind' => '$visits' },
    { '$match' => { 'visits.minute' => { '$gt' => 224 } } },
    { '$sort' => { 'visits.minute' => 1 } },
    { '$group' => { 
      :_id => { '$add' => ['$_id', '$visits.country_name']}, 
      :visits => { '$push' => '$visits' }, 
      :visits_count => { '$sum' => 1 },
      :metacontent => { '$addToSet' => '$metacontent' },
      } 
    },
    { '$sort' => { 'visits_count' => -1 } },
  ]
}

OTHER TIPS

From the documentation

$group Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.

The output of $group depends on how you define groups. Begin by specifying an identifier (i.e. a _id field) for the group you’re creating with this pipeline. You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.

Every group expression must specify an _id field. You may specify the _id field as a dotted field path reference, a document with multiple fields enclosed in braces (i.e. { and }), or a constant value.

I'd try grouping by both _id and country first (letting you do the count you want), then group the result just by _id to give the structure you want.

Updated:

I was thinking something like this.. but I don't have an env setup to check it..

    conn = Mongo::Connection.new
    db   = conn['foobar_development']

    cmd = {
      aggregate: 'live_daily_stats',
      pipeline: [
        { '$project' => {
          :metacontent => 1,
          :visits => 1,
        } },
        { '$unwind' => '$visits' },
        { '$match' => { 'visits.minute' => { '$gt' => 224 } } },
        { '$sort' => { 'visits.minute' => 1 } },
        { '$group' => { 
          :_id => {'$_id','$visits.iso_two_letter_country_code'},
          :page_id => '$_id',
          :visits_count => { '$sum' => 1 },
   .... whatever you want ...
          :metacontent => { '$addToSet' => '$metacontent' },
          } 
        },
        { '$group' => { 
          :_id => '$page_id', 
   .... whatever you want ...
          } 
        },
        { '$sort' => { 'visits_count' => -1 } },
      ]
    }

    res = db.command(cmd)['result']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top