How do I group by _id and country?
-
11-12-2019 - |
Question
I need to group by _id and countries. I've managed to group by _id
, but would like to know how to group the countries in these _id
s and return the count for each country.
I am using the aggregation framework. So far so good.
conn = Mongo::Connection.new
db = conn['foobar_development']
cmd = {
aggregate: 'live_daily_stats',
pipeline: [
{ '$project' => {
:metacontent => 1,
:visits => 1,
} },
{ '$unwind' => '$visits' },
{ '$match' => { 'visits.minute' => { '$gt' => 224 } } },
{ '$sort' => { 'visits.minute' => 1 } },
{ '$group' => {
:_id => '$_id',
:visits => { '$push' => '$visits' },
:visits_count => { '$sum' => 1 },
:metacontent => { '$addToSet' => '$metacontent' },
}
},
{ '$sort' => { 'visits_count' => -1 } },
]
}
res = db.command(cmd)['result']
The following returns:
[
[0] {
"_id" => "20120726/foobar/song/custom-cred",
"visits" => [
[0] {
"country_name" => "UK",
"iso_two_letter_country_code" => "UK",
"referer" => "http://localhost:3000/",
"minute" => 59,
"token_id" => "134326199711wfryhpdq"
},
[1] {
"country_name" => "UK",
"iso_two_letter_country_code" => "UK",
"referer" => "http://localhost:3000/",
"minute" => 59,
"token_id" => "134326199711wfryhpdq"
},
[2] {
"country_name" => "US",
"iso_two_letter_country_code" => "US",
"referer" => "http://localhost:3000/",
"minute" => 59,
"token_id" => "134326199711wfryhpdq"
}
],
"visits_count" => 1,
"metacontent" => [
[0] {
"date" => "20120726"
}
]
},
[1] {
"_id" => "20120725/foobar/song/test-pg3-long-title-here-test-lorem-ipsum-dolor-lo",
"visits" => [
[0] {
"country_name" => "UK",
"iso_two_letter_country_code" => "UK",
"referer" => "http://localhost:3000/",
"minute" => 58,
"token_id" => "13432600883knjzcbic"
}
],
"visits_count" => 1,
"metacontent" => [
[0] {
"date" => "20120725"
}
]
}
]
Solution
I changed the $group
to concatenate both _id
and country_name
:
cmd = {
aggregate: 'live_daily_stats',
pipeline: [
{ '$project' => {
:metacontent => 1,
:visits => 1,
} },
{ '$unwind' => '$visits' },
{ '$match' => { 'visits.minute' => { '$gt' => 224 } } },
{ '$sort' => { 'visits.minute' => 1 } },
{ '$group' => {
:_id => { '$add' => ['$_id', '$visits.country_name']},
:visits => { '$push' => '$visits' },
:visits_count => { '$sum' => 1 },
:metacontent => { '$addToSet' => '$metacontent' },
}
},
{ '$sort' => { 'visits_count' => -1 } },
]
}
OTHER TIPS
From the documentation
$group Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.
The output of $group depends on how you define groups. Begin by specifying an identifier (i.e. a _id field) for the group you’re creating with this pipeline. You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.
Every group expression must specify an _id field. You may specify the _id field as a dotted field path reference, a document with multiple fields enclosed in braces (i.e. { and }), or a constant value.
I'd try grouping by both _id and country first (letting you do the count you want), then group the result just by _id to give the structure you want.
Updated:
I was thinking something like this.. but I don't have an env setup to check it..
conn = Mongo::Connection.new
db = conn['foobar_development']
cmd = {
aggregate: 'live_daily_stats',
pipeline: [
{ '$project' => {
:metacontent => 1,
:visits => 1,
} },
{ '$unwind' => '$visits' },
{ '$match' => { 'visits.minute' => { '$gt' => 224 } } },
{ '$sort' => { 'visits.minute' => 1 } },
{ '$group' => {
:_id => {'$_id','$visits.iso_two_letter_country_code'},
:page_id => '$_id',
:visits_count => { '$sum' => 1 },
.... whatever you want ...
:metacontent => { '$addToSet' => '$metacontent' },
}
},
{ '$group' => {
:_id => '$page_id',
.... whatever you want ...
}
},
{ '$sort' => { 'visits_count' => -1 } },
]
}
res = db.command(cmd)['result']