Question

I've got documents with this simplified schema :

{
   positon: 10,
   value: 5,
   count: 3
}

What I'd like to compute, is to group those documents by position and find the maximum value where the count is greater than 4 but with value less than the minimum value where the count is less than 4.

Here what I've done, but it does not work :

{ $group: { 
          _id: {
                   position: "$position",
                 },
          result: {$max: { $cond: [ {$and: [  {$gte: ["$count", 4]}, 
                                              {$lt: ["$value", {$min: { $cond: [ {$lt: ["$count", 4]}, 
                                                                                 { value: "$value" },  
                                                                                 10]
                                                                      }                                                              
                                                               }]
                                             }]},
                                    { value: "$value", nb: "$count"}, 
                                    0] 
                        }
                }
          }
}

I am said that $minis an invalid operator and I cant figure out how to write the right aggregation function. Would it be better to run a mapreduce ?

If for example I have those documents

{Position: 10, value: 1, count 5}
{Position: 10, value: 3, count 3}
{Position: 10, value: 4, count 5}
{Position: 10, value: 7, count 4}

I'd like the reslt to be

{Position: 10, value: 1, count 4}

As it is the maximum of 'value' where count is greater than 4 but also as there is a value of 3 that has only 3 counts so that the value 4 is not what I'm looking for.

Was it helpful?

Solution

That is a bit of a mouthful to say the least but I'll have another crack at explaining it:

You want:

For each "Position" value find the document whose "value" is less than the the largest "value" of the document with a "count" of less than four, whose own "count" is actually greater than 4.

Which reads like a math exam problem designed to confuse you with the logic. But catching that meaning then you perform the aggregation with the following steps:

db.positions.aggregate([
    // Separate the values greater than and less than 4 by "Position"
    { "$group": {
        "_id": "$Position",
        "high": { "$push": {
            "$cond": [
                { "$gt": ["$count", 4] },
                { "value": "$value", "count": "$count" },
                null
            ]
        }},
        "low": { "$push": {
            "$cond": [
                { "$lt": ["$count", 4] },
                { "value": "$value", "count": "$count" },
                null
            ]
        }}
    }},

    // Unwind the "low" counts array
    { "$unwind": "$low" },

    // Find the "$max" value from the low counts
    { "$group": {
        "_id": "$_id",
        "high": { "$first": "$high" },
        "low":  { "$min": "$low.value" }
    }},

    // Unwind the "high" counts array
    { "$unwind": "$high" },

    // Compare the value to the "low" value to see if it is less than
    { "$project": {
         "high": 1,
         "lower": { "$lt": [ "$high.value", "$low" ] }
    }},

    // Sorting, $max won't work over multiple values. Want the document.
    { "$sort": { "lower": -1, "high.value": -1 } },

    // Group, get the highest order document which was on top
    { "$group": {
        "_id": "$_id",
        "value": { "$first": "$high.value" },
        "count": { "$first": "$high.count" }
    }}
])

So from the set of documents:

{ "Position" : 10, "value" : 1, "count" : 5 }
{ "Position" : 10, "value" : 3, "count" : 3 }
{ "Position" : 10, "value" : 4, "count" : 5 }
{ "Position" : 10, "value" : 7, "count" : 4 }

Only the first is returned in this case as it's value is less than the "count of three" document where it's own count is greater than 4.

{ "_id" : 10, "value" : 1, "count" : 5 }

Which I am sure is what you actually meant.

So the application of $min and $max really only applies when getting discrete values from documents out of a grouping range. If you are interested in more than one value from the document or indeed the whole document, then you are sorting and getting the $first or $last entries on the grouping boundary.

And aggregate is much faster than mapReduce as it uses native code without invoking a JavaScript interpreter.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top