Question

While trying to manually query objects in MongoDB using ObjectIDs I recognized a problem in a mongodb driver. The interpretation of parts of the BSON ObjectId seems to be wrong. I try to fix it but cannot find a decent spec for making it right.

In the mongodb documentation the objectId is defined as 12 byte:

* a 4-byte value representing the seconds since the Unix epoch,
* a 3-byte machine identifier,
* a 2-byte process id, and
* a 3-byte counter, starting with a random value

On the net I find mentions of

"Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON."

But I cannot find the source of it. It makes sense so far as the ids I can see in mongo are indeed big-endian for timestamp. Most objectIds have mostly zero values set so it is hard to figure. My problem is to find the source of that big-endian definition and if it is really the case that

* time is big-endian
* machine id is little-endian
* process id is little-endian
* counter is big-endian
Was it helpful?

Solution

Yes, that is apparently correct. From the mongodb server source code at src/mongo/bson/oid.h:

Typical contents of the BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON. This is because they are compared byte-by-byte and we want to ensure a mostly increasing order.

(emphasis mine).

There's also a lot of endian-swapping code around for the timestamp, so it seems the comment isn't outdated or anything, also the byte-ordering-thingie makes sense.

OTHER TIPS

Looking at the source code of the Go driver (which the MongoDB people did praise as the most advanced and well written driver) it's clear that all fields in the ObjectID are stored as big-endian:

http://bazaar.launchpad.net/+branch/mgo/v2/view/head:/bson/bson.go#L295

The purpose behind it is most certainly to be able to sort ObjectIDs in lexicographic order (byte by byte) and have the numeric values appear in order.

The machine identifier is treated as an opaque 3-byte array, so it's not necessarily in any order.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top