Question

In my app I'm letting mongo generate order id's via its ObjectId method.

But in user testing we've had some concerns that the order id's are humanly 'intimidating', i.e. if you need to discuss your order with someone over the telephone, reading out 24 alphanumeric characters is a bit tedious.

At the same time, I don't really want to have to store two different id's, one 'human-accessible' and one used by mongo internally.

So my question is this - is there a way to choose a substring of length 6 or even 8 of the mongo objectId string that I could be fairly sure would be unique ?

For example if I have a mongo objectid like this

id = '4b28dcb61083ed3c809e0416'

maybe I could take out

human_id = id.substr(0,7);

and be sure that i'd always get unique id's for my orders...

The advantage of course is that these are orders, and so are human-created, and so there aren't millions of them per millisecond. On the other hand, it would really be a problem if two orders had the same shortened id...

--- clearer explanation ---

I guess a better way to ask my question would be this :

If I decide for example to just use the last 6 characters of a mongo id, is there some kind of measure of 'probability' that just these 6 characters would repeat in a given week ?

Given a certain number of mongo's running in parallel, a certain number of users during the week, etc.

Was it helpful?

Solution

If you have multiple web servers, with multiple processes, then there really isn't something you can remove with losing uniqueness.

If you look at the nature of the ObjectId:

  • a 4-byte value representing the seconds since the Unix epoch,
  • a 3-byte machine identifier,
  • a 2-byte process id, and
  • a 3-byte counter, starting with a random value.

You'll see there's not much there that you could safely remove. As the first 4 bytes are time, it would be challenging to implement an algorithm that removed portions of the time stamp in a clean and safe way.

The machine identifier and process identifier are used in cases where there are multiple servers and/or processes acting as clients to the database server. If you dropped either of those, you could end up with duplicates again. The random value as the last 3 bytes is used to make sure that two identifiers, on the same machine, within the same process are unique, even when requested frequently.

If you were using it as an order id, and you want assured uniqueness, I wouldn't trim anything away from the 12 byte number as it was carefully designed to provide a robust and efficient distributed mechanism for generating unique numbers when there are many connected database clients.

If you took the last 5 characters of the ObjectId ..., and in a given period, what's the probability of conflict?

  • process id
  • counter

The probability of conflict is high. The process id may remain the same through the entire period, and the other number is just an incrementing number that would repeat after 4095 orders. But, if the process recycles, then you also have the chance that there will be a conflict with older orders, etc. And if you're talking multiple database clients, the chances increase as well. I just wouldn't try to trim the number. It's not worth the unhappy customers trying to place orders.

Even the timestamp and the random seed value aren't sufficient when there are multiple database clients generating ObjectIds. As you start to look at the various pieces, especially in the context of a farm of database clients, you should see why the pieces are there, and why removing them could lead to a meltdown in ObjectId generation.

I'd suggest you implement an algorithm to create a unique number and store it in the database. It's simple enough to do. It does impact performance a bit, but it's safe.

I wrote this answer a while ago about the challenges of using an ObjectId in a Url. It includes a link to how to create a unique auto incrementing number using MongoDB.

OTHER TIPS

Actually what you choose for and Id (actually _id in MongoDB storage) is totally up to you. If there is some useful data you can keep in _id as long as you keep it unique, then do so. If it has to be something valid to url encoding, then do so.

By default, if you do not specify an _id then that field will be populated with the value you have come to love and hate. But if you explicitly use it, then you will get what you want.

The extra thing to keep in mind is that even if you specify an addtional unique index field, let's say order_id then MongoDB would actually have to check through that and other indexes on a query plan to see which one was best to use. But if _id was your key, the plan would give up and go strait for the 'Primary Key', and this is going to be a lot faster.

So make your own Id just as long as you can ensure it will be unique.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top