Question

I have a collection which contains documents with multiple arrays. These are generally quite large, but for purposes of explaining you can consider the following two documents:

{
    "obj1": [
        { "a": "a", "b": "b" },
        { "a": "a", "b": "c" },
        { "a": "a", "b": "b" }
    ],
    "obj2": [
        { "a": "a", "b": "b" },
        { "a": "a", "b": "c" }
    ]
},
{
    "obj1": [
        { "a": "c", "b": "b" }
    ],
    "obj2": [
        { "a": "c", "b": "c" }
    ]
}

The idea is to just get the matching elements in the array to the query. There are multiple matches required and within multiple arrays so this is not within the scope of what can be done with projection and the positional $ operator. The desired result would be like:

{
    "obj1": [
        { "a": "a", "b": "b" },
        { "a": "a", "b": "b" }
    ],
    "obj2": [
        { "a": "a", "b": "b" },
    ]
},

A traditional approach would be something like this:

db.objects.aggregate([
    { "$match": {
        "obj1": {
            "$elemMatch": { "a": "a", "b": "b" }
        },
        "obj2.b": "b"
    }},
    { "$unwind": "$obj1" },
    { "$match": {
        "obj1.a": "a",
        "obj1.b": "b"
    }},
    { "$unwind": "$obj2" },
    { "$match": { "obj2.b": "b" }},
    { "$group": {
        "_id": "$_id",
        "obj1": { "$addToSet": "$obj1" },
        "obj2": { "$addToSet": "$obj2" }
    }}
])

But the use of $unwind there for both arrays causes the overall set to use a lot of memory and slows things down. There are also possible problems there with $addToSet and splitting the $group stages for each array can make things even slower.

So I am looking for a process that is not so intensive but arrives at the same result.

Was it helpful?

Solution

Since MongoDB 3.0 we have the $filter operator, which makes this really quite simple:

db.objects.aggregate([
    { "$match": {
        "obj1": {
            "$elemMatch": { "a": "a", "b": "b" }
        },
        "obj2.b": "b"
    }},
    { "$project": {
      "obj1": {
        "$filter": {
          "input": "$obj1",
          "as": "el",
          "cond": {
            "$and": [
              { "$eq": [ "$$el.a", "a" ] },
              { "$eq": [ "$$el.b", "b" ] }
            ]
          }
        }
      },
      "obj2": {
        "$filter": {
          "input": "$obj2",
          "as": "el",
          "cond": { "$eq": [ "$$el.b", "b" ] }
        }
      }
    }}
])

MongoDB 2.6 introduces the $map operator which can act on arrays in place without the need to $unwind. Combined with some other logical operators and additional set operators that have been added to the aggregation framework there is a solution to this problem and others.

db.objects.aggregate([
    { "$match": {
        "obj1": {
            "$elemMatch": { "a": "a", "b": "b" }
        },
        "obj2.b": "b"
    }},
    { "$project": {
        "obj1": {
            "$setDifference": [
                { "$map": {
                    "input": "$obj1",
                    "as": "el",
                    "in": {
                         "$cond": [
                           { "$and": [
                                { "$eq": [ "$$el.a", "a" ] },
                                { "$eq": [ "$$el.b", "b" ] }
                            ]},
                            "$$el",
                            false
                        ]
                    }
                }},
                [false]
            ]
        },
        "obj2": {
            "$setDifference": [
                { "$map": {
                    "input": "$obj2",
                    "as": "el",
                    "in": {
                        "$cond": [
                            { "$eq": [ "$$el.b", "b" ] },
                            "$$el",
                            false
                        ]
                    }
                }},
                [false]
            ]
        }
    }}
])

The core of this is in the $map operator which works like an and internalized $unwind by allowing processing of all the array elements, but also allows operations to act on those array elements in the same statement. Typically this would be done in several pipeline stages but here we can process within a single $project, $group or $redact stage.

In this case that inner processing utilizes the $cond operator which combines with a logical condition in order to return a different result for true or false. Here we act on usage of the $eq operator to test values of the fields contained within the current element in much the same way as a separate $match pipeline stage would be used. The $and condition is another logical operator which works on combining the results of multiple conditions on the element, much in the same way as the $elemMatch operator would work within a $match pipeline stage.

Finally, since our $cond operator was used to either return the value of the current element or false if the condition was not true we need to "filter" any false values from the array produced my the $map operation. The is where the $setDifference operator is used to compare the two input arrays and return the difference. So when compared to an array that only contains false for it's element, the result will be the elements that were returned from the $map without the false elements coming out of $cond when the conditions were not met.

The result filters only the matching elements from the array without having to run through seperate pipeline stages for $unwind, $match and $group.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top