Question

I'm trying to understand why auto-increment pattern is bad when scaling.

I've also read this article. There are such words in it:

  • Instead, you need to use a proper UUID method to ensure that you don’t hit race conditions and the ID is truly unique across clusters.

I'm trying to find out the exact scenario of circumstances with _id duplication in shards.

And another one question: what about auto-increment for non-primary key? Is it safe?

Thank you very much!

Was it helpful?

Solution

In order to guarantee that an auto-increment value is unique, the ID creation must occur on a single thread on a single host (even if multiple threads are used, the point of ID creation must block other threads). So, in a cluster of 100 servers, IDs must be created on 1 thread on 1 out the 100 servers. This not just a performance bottleneck, it is possible that the creation of 2 auto-increment IDs might block each other, which is the race condition noted in the quotation you've cited.

It should be noted that transactional RDBMS systems like Oracle and SQL Server have solved the race condition problem, but there is no solution to the performance bottleneck.

So: no, don't use auto-increment in non-primary keys if you anticipate the need to scale your system.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top