Use MongoDB projection for nesting whole documents?

https://stackoverflow.com//questions/25017543

21-12-2019
|

Question

I have a flat collection of documents, where some documents have a parent: ObjectId field, which points another document from the same collection, i.e.:

{id: 1, metadata: {text: "I'm a parent"}}
{id: 2, metadata: {text: "I'm child 1", parent: 1}}

Now I'd like to retrieve all parents where metadata.text = "I'm a parent" plus it's child elements. But I want that data in a nested format, so I can simply process it afterwards without having a look at metadata.parent. The output should look like:

{
  id: 1,
  metadata: {text: "I'm a parent"},
  children: [
    {id: 2, metadata: {text: "I'm child 1", parent: 1}}
  ]
}

(children could also be part of the parent's metadata object if that's easier)

Why don't I save the documents in a nested structure? I don't want to store the data in a nested format in DB, because those documents are part of GridFS.

The main problem is: How can I tell MongoDB to nest a whole document? Or do I have to use Mongo's aggregation framework for that task?

Solution

For the sort of "projection" you are asking for then the aggregation framework is the correct tool as this sort of "document re-shaping" is only really supported there.

The other case is the "parent/child" thing, where you again need to be "creative" when grouping using the aggregation framework. The full operations show what is essentially involved:

db.collection.aggregate([

    // Group parent and children together with conditionals
    { "$group": {
        "_id": { "$ifNull": [ "$metadata.parent", "$_id" ] },
        "metadata": {
            "$addToSet": {
                "$cond": [
                    { "$ifNull": [ "$metadata.parent", false ] },
                    false,
                    "$metadata"
                ]
            }
        },
        "children": {
            "$push": {
                "$cond": [
                    { "$ifNull": [ "$metadata.parent", false ] },
                    "$$ROOT",
                    false
                ]
            }
        }
    }},

    // Filter out "false" values
    { "$project": {
        "metadata": { "$setDifference": [ "$metadata", [false] ] },
        "children": { "$setDifference": [ "$children", [false] ] }
    }},

    // metadata is an array but should only have one item
    { "$unwind": "$metadata" },

    // This is essentially sorting the children as "sets" are un-ordered 
    { "$unwind": "$children" },
    { "$sort": { "_id": 1, "children._id": 1 } },
    { "$group": {
        "_id": "$_id",
        "metadata": { "$first": "$metadata" },
        "children": { "$push": "$children" }
    }}
])

The main thing here is the $ifNull operator used on the grouping _id. This will choose to $group on the "parent" field where present, otherwise using the general document _id.

Similar things are done with the $cond operator later where the evaluation is made of which data to add to the array or "set". In the following $project the false values are filtered out by use of the $setDifference operator.

If the final $sort and $group there seem confusing, then the actual reason is because the operator used is a "set" operator the resulting "set" is considered to be un-ordered. So really that part is just there to make sure that the array contents appear in order of their own _id field.

Without the additional operators from MongoDB 2.6 this can still be done, but just a little differently.

db.collection.aggregate([
    { "$group": {
        "_id": { "$ifNull": [ "$metadata.parent", "$_id" ] },
        "metadata": {
            "$addToSet": {
                "$cond": [
                    { "$ifNull": [ "$metadata.parent", false ] },
                    false,
                    "$metadata"
                ]
            }
        },
        "children": {
            "$push": {
                "$cond": [
                    { "$ifNull": [ "$metadata.parent", false ] },
                    { "_id": "$_id","metadata": "$metadata" },
                    false
                ]
            }
        }
    }},
    { "$unwind": "$metadata" },
    { "$match": { "metadata": { "$ne": false } } },
    { "$unwind": "$children" },
    { "$match": { "children": { "$ne": false } } },
    { "$sort": { "_id": 1, "children._id": 1 } },
    { "$group": {
        "_id": "$_id",
        "metadata": { "$first": "$metadata" },
        "children": { "$push": "$children" }
    }}
])

Essentially the same thing but without the newer operators introduced in MongoDB 2.6, so this would work in earlier versions as well.

This will all be fine as long as your relationships are a single level of parent and child. For nested levels you would need to invoke a mapReduce process instead.

OTHER TIPS

I wanted a similar result to Neil Lunn's answer except I wanted to fetch all parents regardless of them having children or not. I also wanted to generalise it to work across any collection that had a single level of nested children.

Here's my query based on Neil Lunn's answer

db.collection.aggregate([
  {
    $group: {
      _id: {
        $ifNull: ["$parent", "$_id"]
      },
      parent: {
        $addToSet: {
          $cond: [
            {
              $ifNull: ["$parent", false]
            }, false, "$$ROOT"
          ]
        }
      },
      children: {
        $push: {
          $cond: [
            {
              $ifNull: ["$parent", false]
            }, "$$ROOT", false
          ]
        }
      }
    }
  }, {
    $project: {
      parent: {
        $setDifference: ["$parent", [false]]
      },
      children: {
        $setDifference: ["$children", [false]]
      }
    }
  }, {
    $unwind: "$parent"
  }
])

This results in every parent being returned where the parent field contains the whole parent document and the children field returning either an empty array if the parent has no children or an array of child documents.

{
  _id: PARENT_ID
  parent: PARENT_OBJECT
  children: [CHILD_OBJECTS]
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow