Question

Im trying to search through a collection and group records by date field which is a datetime. I know pymongo converts those to the proper type on the background (ISODate or something like that).

Question is, since datetime objects have date, time, timezone.. how can i tell the group operator to use only the date portion? Because otherwise i dont get the desired grouping since time is preventing the records with same day, month, year to be grouped together.

db.test.aggregate([
        {"$group": {
             "_id": "$date", 
             "count": {"$sum": 1}
        }},
        {"$limit": 10}])

Result:

{u'ok': 1.0,
 u'result': [
  {u'_id': datetime.datetime(2014, 2, 15, 18, 49, 9, tzinfo=<bson.tz_util.FixedOffset object at 0x318f210>),
   u'count': 1},
  {u'_id': datetime.datetime(2014, 2, 15, 18, 36, 38, tzinfo=<bson.tz_util.FixedOffset object at 0x318f210>),
   u'count': 1},
  {u'_id': datetime.datetime(2014, 2, 15, 18, 23, 56, tzinfo=<bson.tz_util.FixedOffset object at 0x318f210>),
   u'count': 1}]}

It would be nice to control the datetime information used to group,

  • group by date only
  • group by date and hour
  • group by date, hour and minute
  • etc.

Is there something like: (or some way of telling to use date only)

db.test.aggregate([
          {"$group": {
              "_id": "$date.date()",
              "count": {"$sum": 1}
          }},
          {"$sort": "_id"}
])

Or maybe there’s another way of dealing with this, any ideas? Thanks.

Was it helpful?

Solution

Yes. You can use the Date Operators with $substr and $concat to tie it all together.

db.test.aggregate([
    {"$group": {
        "_id" : { "$concat": [
            {"$substr": [{"$year": "$date"}, 0, 4 ]},
            "-",
            {"$substr": [{"$month": "$date"}, 0, 2 ]},
            "-",
            {"$substr": [{"$dayOfMonth": "$date"}, 0, 2 ]},
        ]},
        "count": {"$sum": 1 }
     }},
     {"$sort": { "_id": 1 }}
])

You could use just the date operators and make a document as in:

"day": { 
    "year": {"$year": "$date" },
   "month": {"$month": "$date"}, 
   "day": {"$dayOfYear": "$date"}
}

That works just as well. But this gives you a nice string. This makes use of the fact that $substr will cast from integer to string. If that ever gets added to the documentation.

Look at the Date Operators documentation for usage on the other time divisions that can be used on dates.


Better yet, use date math to return a BSON Date:

import datetime

db.test.aggregate([
    { "$group": {
        "_id": {
            "$add": [
               { "$subtract": [
                   { "$subtract": [ "$date", datetime.datetime.utcfromtimestamp(0) ] },
                   { "$mod": [
                       { "$subtract": [ "$date", datetime.datetime.utcfromtimestamp(0) ] },
                       1000 * 60 * 60 * 24
                   ]}
               ]},
               datetime.datetime.utcfromtimestamp(0)
           ]
        },
        "count": { "$sum": 1 }
    }},
    { "$sort": { "_id": 1 } }
])

Here datetime.datetime.utcfromtimestamp(0) will be fed into the pipeline as a BSON Date representing "epoch". When you $subtract one BSON Date from another the difference in milliseconds is returned. This allows you to "round" the date to the current day by again subtracting the $mod result to get the remainder of milliseconds difference from a day.

The same is true of $add where "adding" a BSON Date to a numeric value will result in a BSON Date.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top