Question

There are many documents:

{
        "_id"   : ObjectId("506ddd1900a47d802702a904"),
        "subid" : "s1",
        "total" : "300",
        "details" :[{
                      name:"d1", value: "100"
                    },
                    {
                      name:"d2", value: "200"
                    }]
}
{
        "_id"   : ObjectId("306fff1900a47d802702567"),
        "subid" : "s1",
        "total" : "700",
        "details" : [{
                      name:"d1", value: "300"
                    },
                    {
                      name:"d8", value: "400"
                    }]
 }

Elements in 'details' arrays may vary.

Question is: how can I get such result with aggregation framework and java?

{
        "_id"     : "s1",
        "total"   : "1000",
        "details" : [{
                      name:"d1", value: "400"
                    },
                    {
                      name:"d2", value: "200"
                    },
                    {
                      name:"d8", value: "400"
                    }]
 }

Or maybe I should use custom map-reduce functions here?

Was it helpful?

Solution

This is very achievable with aggregate, though a little obtuse, but lets run through it:

db.collection.aggregate([

    // First Group to get the *master* total for the documents
    {"$group": {
        "_id": "$subid",
         "total": { "$sum": "$total" },
         details: { "$push": "$details" } 
     }},

     // Unwind the details
     {"$unwind": "$details"},

     // Unwind the details "again" since you *pushed* and array onto an array
     {"$unwind":"$details"},

     // Now sum up the values by each name (keeping levels)
     {"$group": {
         "_id:" {
              "_id": "$_id",
              "total": "$total",
              "name":  "$details.name"
          },
          "value": {"$sum": "$details.value"}
      }},

     // Sort the names (because you expect that!)
     {"$sort": { "_id.name": 1}},

     // Do some initial re-shaping for convenience
     {"$project": {
         "_id": "$_id._id",
         "total": "$_id.total",
         "details": { "name": "$_id.name", "value": "$value" }
     }},

     // Now push everything back into an array form
     {"$group": {
         "_id": {
              "_id": "$_id",
              "total": "$total"
         },
         "details": {"$push": "$details"}
     }},

     // And finally project nicely
     {"$project": {
         "_id": "$_id._id",
         "total": "$_id.total",
         "details": 1 
     }}
])

So if you gave that a try before, you might have missed the concept of doing the initial group to get the top level sum on your total field in your documents.

Admittedly, the tricky bit is "getting your head around" the whole double unwind thing that comes next. Since in that first group we pushed an array into another array, then we now end up with this new nested structure that you need to unwind twice in order to come to a "de-normalized" form.

Once you've done that, you just $group up to the name field:

equiv ( GROUP BY _id, total, "details.name" )

So more or less like that with some sensible re-shaping. Then I ask to sort by the name key (because you printed it that way), and finally we $project into the actual form that you wanted.

So Bingo, we have your result. Thanks for the cool question to show the use of a double unwind.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top