Frage

Windows 7 64 SP1 -- MongoDB 2.2.0-rc2 -- Boost 1.42 -- MS VS 2010 Ultimate -- C++ driver

Following "Mongo in Action", in the shell:

for(i=0; i<200000; i++){
  db.numbers.save({num: i});
}

db.numbers.find() displays:

{ "_id": ObjectId("4bfbf132dba1aa7c30ac830a"),"num" : 0 }
{ "_id": ObjectId("4bfbf132dba1aa7c30ac830b"),"num" : 1 }
{ "_id": ObjectId("4bfbf132dba1aa7c30ac830c"),"num" : 2 }
{ "_id": ObjectId("4bfbf132dba1aa7c30ac830d"),"num" : 3 }
...

So, replicating in C++:

// Insert 200,000 documents
for ( int i = 0; i < 200000 ; i++)
  c.insert(dc,BSON(GENOID << "num" << i));

//Display the first 20 documents
Query qu = BSONObj();
auto_ptr<DBClientCursor> cursor = c.query(dc,qu); 
for ( int i = 0 ; i < 20 ; i++){
  cout << cursor->next().toString() << endl;
}

The output:

{ "_id" : ObjectId("504bab737ed339cef0e26829"), "num" : 199924 }
{ "_id" : ObjectId("504bab737ed339cef0e2682a"), "num" : 199925 }
{ "_id" : ObjectId("504bab737ed339cef0e2682b"), "num" : 199926 }
{ "_id" : ObjectId("504bab737ed339cef0e2682c"), "num" : 199927 }
....

Invoking db.numbers.find() in the shell has the same output. Why isn't it starting with {"num" : 0}? It exists:

> db.numbers.find({"num" : 0})
{ "_id" : ObjectId("504bab417ed339cef0df5b35"), "num" : 0 }

The _id for {"num" : 0} is before the _id for {"num" : 199924}

And an index on "_id" exists:

> db.numbers.getIndexes()
[
    {
            "v" : 1,
            "key" : {
                    "_id" : 1
            },
            "ns" : "learning.numbers",
            "name" : "_id_"
    }
]

If I add sort by _id by changing the query code to read:

auto_ptr<DBClientCursor> cursor = c.query(dc,qu.sort("_id")); 

then it prints in order:

{ "_id": ObjectId("4bfbf132dba1aa7c30ac830a"),"num" : 0 }
{ "_id": ObjectId("4bfbf132dba1aa7c30ac830b"),"num" : 1 }
...

This doesn't happen with a smaller collection (say 200) of documents.

The question: Why does it appear that the C++ query isn't using the collection's index on _id? Or what else explains this apparent anomaly (or my lack of understanding?

War es hilfreich?

Lösung

Indexing and sorting are distinct concepts. You can find data in an index without sorting the results; you can also sort results without using an index (though this isn't recommended).

Since you have not specified a sort order for your find(), the results will be returned in natural order. For a collection where you have only inserted documents (and never deleted or updated) the natural order should approximate insertion order (unless you happen to be using a capped collection, which is maintained in insertion order).

Once you start deleting documents or updating them (which may cause them to be moved) there will be free space "gaps" created in MongoDB's preallocated data files. MongoDB will reuse the free space for new document insertions/moves .. so over time the natural order will no longer match the insertion order.

If you are expecting results in a specific sort order, you have to include this in your query.

Andere Tipps

@stenni Thank you -- those "gaps" are the problem and led me to the solution. The natural order in the shell appeared to be more "natural" than the C++ driver when querying, the latter starting with a very large "num". However, the fault lies in my methodology:

  1. Insert 200000 documents in shell.
  2. db.numbers.find(); first document listed was {"num" : 0}
  3. db.numbers.remove()
  4. Insert 200000 documents with C++ driver
  5. db.numbers.find(); first document listed was {"num" : SomeVeryLargeNumber}

Instead, I should have used db.numbers.drop(), actually deleting the collection. Doing so means that step 5's first document listed is {"num" : 0}. db.numbers.remove obviously keeps the gaps.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top