Domanda

In order to select the 100 newest documents from MongoDB, where each document is composed from multiple documents in the same collection that have a similar field (in this case timestamp), I'm using the following series of queries in Node.js:

        return q.ninvoke(collection, 'aggregate',
            [
                {
                    $match  : { active: true }
                },
                {
                    $limit  : 100
                },
                {
                    $group  : {
                        _id         : "$timestamp",
                        mintime : {
                            $min        : "$seconds"
                        },
                        timestamp   : {
                            $first      : "$timestamp"
                        },
                        data        : {
                            $first      : "$data"
                        }
                    }
                }
            ]);

This works fine when there are less than $limit documents in the collection. When there are more, it selects the oldest documents (inserted first), not the documents with the highest timestamp (which is often but not always the last one inserted).

This is unexpected, as documents are inserted into the collection with the following ensured index:

collection.ensureIndex({
    timestamp   : -1,
    seconds     : -1,
    active      : -1
}, {
    sparse : false
});

I was under the impression that the -1 first index on timestamp meant that they were indexed in descending order, resulting in a collection where the first $limit documents would always be the ones with the highest timestamp.

Why doesn't this work as expected?
Am I wrong?

È stato utile?

Soluzione

Actually your real problem here is that index is not being selected. You can check this via the explain option ( available in MongoDB 2.6 or actually from MongoDB 2.4.9 though not documented ) from the db.runCommand form of invoking aggregate.

With MongoDB it is very important to specify the field you wish to use in a index when matching first. So an index defined as:

collection.ensureIndex({ "active": 1 })

Or even with -1 would get selected in this case. Your index does not because you did not reference any of the other fields.

This can force over larger selections, when the optimizer recognizes this would be the optimal case, but this actually appears to be broken in current 2.6 releases (until fixed).

Addendum: So there is possibly a "sorting" component to be involved, but that is more about how you specify the compound index yet again. To ensure you "timestamp" values are in order for the grouping boundaries, make sure you include that after the initial selector, as in:

collection.ensureIndex({ "active": -1, "timestamp": -1 })

In your required order.

Altri suggerimenti

Very important answer to supplement the one given by @NeilLunn:

I don't know the technical details, but even the correct statement can consistently select wrong documents from the index if your diskspace is "low". Mongo might not even complain about this, it will just select wrong documents.

Even though MongoDB will create four sparse files of a gigabyte each, Mongo can still choke if the free space drops below a gigabyte.

If this happens, free up at least two gigabytes and defragment the data:

  • /etc/init.d/mongodb stop

  • mongod --repair

  • /etc/init.d/mongodb start

As a rule of thumb I would say: Keep at least 2̶G̶B̶ 2 + 4 = 6GB free at all times.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top