Question

I've been trying to understand the MongoDB Aggregate process so I can better optimize my queries and I'm confused by usage and $match and $sort together.

Sample DB has only one collection people

[{
    "name": "Joe Smith",
    "age": 40,
    "admin": false
},
{
    "name": "Jen Ford",
    "age": 45,
    "admin": true
},
{
    "name": "Steve Nash",
    "age": 45,
    "admin": true
},
{
    "name": "Ben Simmons",
    "age": 45,
    "admin": true
}]

I've multiplied this data x1000 just as a POC.

The DB above has one index name_1

The Following query

db.people.find({"name": "Jen Ford"}).sort({"_id": -1}).explain()

Has the following output

{ queryPlanner: 
   { plannerVersion: 1,
     namespace: 'db.people',
     indexFilterSet: false,
     parsedQuery: { name: { '$eq': 'Jen Ford' } },
     queryHash: '3AE4BDA3',
     planCacheKey: '2A9CC473',
     winningPlan: 
      { stage: 'SORT',
        sortPattern: { _id: -1 },
        inputStage: 
         { stage: 'SORT_KEY_GENERATOR',
           inputStage: 
            { stage: 'FETCH',
              inputStage: 
               { stage: 'IXSCAN',
                 keyPattern: { name: 1 },
                 indexName: 'name_1',
                 isMultiKey: false,
                 multiKeyPaths: { name: [] },
                 isUnique: false,
                 isSparse: false,
                 isPartial: false,
                 indexVersion: 2,
                 direction: 'forward',
                 indexBounds: { name: [ '["Jen Ford", "Jen Ford"]' ] } } } } },
     rejectedPlans: 
      [ { stage: 'FETCH',
          filter: { name: { '$eq': 'Jen Ford' } },
          inputStage: 
           { stage: 'IXSCAN',
             keyPattern: { _id: 1 },
             indexName: '_id_',
             isMultiKey: false,
             multiKeyPaths: { _id: [] },
             isUnique: true,
             isSparse: false,
             isPartial: false,
             indexVersion: 2,
             direction: 'backward',
             indexBounds: { _id: [ '[MaxKey, MinKey]' ] } } } ] },
  serverInfo: 
   { host: '373ea645996b',
     port: 27017,
     version: '4.2.0',
     gitVersion: 'a4b751dcf51dd249c5865812b390cfd1c0129c30' },
  ok: 1 }

This makes total sense.

However

The following query results in the same set but uses the aggregate pipeline

db.people.aggregate([ { $match: { $and: [{ name: "Jen Ford" }]}}, { $sort: {"_id": -1}}], {"explain": true})

Has the following output.

{ queryPlanner: 
   { plannerVersion: 1,
     namespace: 'db.people',
     indexFilterSet: false,
     parsedQuery: { name: { '$eq': 'Jen Ford' } },
     queryHash: '3AE4BDA3',
     planCacheKey: '2A9CC473',
     optimizedPipeline: true,
     winningPlan: 
      { stage: 'FETCH',
        filter: { name: { '$eq': 'Jen Ford' } },
        inputStage: 
         { stage: 'IXSCAN',
           keyPattern: { _id: 1 },
           indexName: '_id_',
           isMultiKey: false,
           multiKeyPaths: { _id: [] },
           isUnique: true,
           isSparse: false,
           isPartial: false,
           indexVersion: 2,
           direction: 'backward',
           indexBounds: { _id: [ '[MaxKey, MinKey]' ] } } },
     rejectedPlans: [] },
  serverInfo: 
   { host: '373ea645996b',
     port: 27017,
     version: '4.2.0',
     gitVersion: 'a4b751dcf51dd249c5865812b390cfd1c0129c30' },
  ok: 1 }

Notice how the Aggregate Query is unable to recognize it should utilize the name index against the $match. This has massive implications as the size of the collection grows

I've seen this behavior now in Mongo 3.4, 3.6, and 4.2.

https://docs.mongodb.com/v4.2/core/aggregation-pipeline-optimization/ provides this blurb

$sort + $match Sequence Optimization: When you have a sequence with $sort followed by a $match, the $match moves before the $sort to minimize the number of objects to sort.

From all this, I think I'm fundamentally misunderstanding something with the Mongo aggregate command.

I already understand that if I create a composite index name,_id then it will work as it includes the fields used in my $match and my $sort clause.

But why must an index include a field from the $sort clause to be utilized to restrict my $match set? It seems obvious that we would prefer to $sort on the smallest set possible?

No correct solution

OTHER TIPS

The aggregation should utilize the index. The docs also pretty clearly match your expected behavior:

https://docs.mongodb.com/manual/core/aggregation-pipeline/#aggregation-pipeline-operators-and-performance

I wonder if the $and operator is causing an issue.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top