Question

Note : I have provided only a few documents in the output to keep the post small but intuitive

The source collection :

{
        "_id" : {
                "SpId" : 840,
                "Scheduler_Id" : 1,
                "Channel_Id" : 2,
                "TweetId" : 15
        },
        "PostDate" : ISODate("2013-10-31T18:30:00Z")
}
{
        "_id" : {
                "SpId" : 840,
                "Scheduler_Id" : 1,
                "Channel_Id" : 2,
                "TweetId" : 16
        },
        "PostDate" : ISODate("2013-10-31T18:30:00Z")
}
{
        "_id" : {
                "SpId" : 840,
                "Scheduler_Id" : 1,
                "Channel_Id" : 2,
                "TweetId" : 17
        },
        "PostDate" : ISODate("2013-10-30T18:30:00Z")
}

Step-1 : Grouping by PostDate

Query :

db.Twitter_Processed.aggregate({$match : { "_id.SpId" : 840, "_id.Scheduler_Id" : 1 }},{$project:{SpId : "$_id.SpId",Scheduler_Id : "$_id.Scheduler_Id",day:{$dayOfMonth:'$PostDate'},month:{$month:'$PostDate'},year:{$year:'$PostDate'}, senti : "$Sentiment"}}, {$group : {_id : {SpId : "$SpId", Scheduler_Id : "$Scheduler_Id",day:'$day',month:'$month',year:'$year'}, sentiment : { $sum : "$senti"}}}, {$group : {_id : "$_id" , avgSentiment : {$avg : "$sentiment"}}})

Output :

{
        "result" : [
                {
                        "_id" : {
                                "SpId" : 840,
                                "Scheduler_Id" : 1,
                                "day" : 31,
                                "month" : 10,
                                "year" : 2013
                        },
                        "avgSentiment" : 2.2700000000000005
                },
                {
                        "_id" : {
                                "SpId" : 840,
                                "Scheduler_Id" : 1,
                                "day" : 30,
                                "month" : 10,
                                "year" : 2013
                        },
                        "avgSentiment" : 4.96
                }
}

Step-2 : Attempting to achieve this :

{
        "result" : [
                {
                        "_id" : {
                                "SpId" : 840,
                                "Scheduler_Id" : 1,
                 "Date" : ISODate("2013-10-31T18:30:00Z")
                        },
                        "avgSentiment" : 2.2700000000000005
                },
                {
                        "_id" : {
                                "SpId" : 840,
                                "Scheduler_Id" : 1,
                "Date" : ISODate("2013-10-31T18:30:00Z")
                        },
                        "avgSentiment" : 4.96
                }
}

The query I attempted :

db.Twitter_Processed.aggregate({$match : { "_id.SpId" : 840, "_id.Scheduler_Id" : 1 }},{$project:{SpId : "$_id.SpId",Scheduler_Id : "$_id.Scheduler_Id",day:{$dayOfMonth:'$PostDate'},month:{$month:'$PostDate'},year:{$year:'$PostDate'}, senti : "$Sentiment"}}, {$group : {_id : {SpId : "$SpId", Scheduler_Id : "$Scheduler_Id",day:'$day',month:'$month',year:'$year'}, sentiment : { $sum : "$senti"}}}, {$group : {_id : "$_id" , avgSentiment : {$avg : "$sentiment"}}}, {$project : {_id : {SpId : "$_id.SpId",Scheduler_Id : "$_id.Scheduler_Id", date : new Date("$_id.year","$_id.month","$_id.day")}, avgSentiment : "$avgSentiment"}})

Output(error) :

Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:37:15)
    at DBCollection.aggregate (src/mongo/shell/collection.js:897:9)
    at (shell):1:22
Tue Dec 31 09:41:42.916 JavaScript execution failed: aggregate failed: {
        "errmsg" : "exception: disallowed field type Date in object expression (
at 'date')",
        "code" : 15992,
        "ok" : 0
} at src/mongo/shell/collection.js:L898

How do I achieve Step-2 ?

Was it helpful?

Solution

As you've noticed, the Aggregation Framework (as at MongoDB 2.4) has operators to extract parts of dates but not to easily create date fields.

There's a great blog post on Stupid date tricks with Aggregation Framework that provides a creative workaround: truncate the date granularity using $project before you $group:

db.Twitter_Processed.aggregate(

    // Match (can take advantage of suitable index)
    { $match : {
        "_id.SpId" : 840,
        "_id.Scheduler_Id" : 1
    }},

    // Extract h/m/s/ms values from PostDate for rounding
    { $project: {
        SpId : "$_id.SpId",
        Scheduler_Id : "$_id.Scheduler_Id",
        PostDate : "$PostDate",
        h  : { "$hour"   : "$PostDate" },
        m  : { "$minute" : "$PostDate" },
        s  : { "$second" : "$PostDate" },
        ms : { "$millisecond" : "$PostDate" },
        senti : "$Sentiment"
    }},

    // Subtract the h/m/s/ms values to round the date off to yyyy-mm-dd
    { $project: {
        SpId : "$_id.SpId",
        Scheduler_Id : "$_id.Scheduler_Id",

        // PostDate will end up truncated to yyyy-mm-dd granularity
        PostDate: {
            "$subtract" : [
                "$PostDate",
                {
                    "$add" : [
                        "$ms",
                        { "$multiply" : [ "$s", 1000 ] },
                        { "$multiply" : [ "$m", 60, 1000 ] },
                        { "$multiply" : [ "$h", 60, 60, 1000 ]}
                    ]
                }
            ]
        },
        senti: "$Sentiment"
    }},

    { $group : {
        _id : {
            SpId : "$SpId",
            Scheduler_Id : "$Scheduler_Id",
            PostDate: "$PostDate"
        },
        sentiment : { $sum : "$senti"}
    }},

    { $group : {
        _id : "$_id" ,
        avgSentiment : {$avg : "$sentiment"}
    }}
)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top