Question

I have multiple clients accessing a Mongo cluster. Sometimes they need to create new collections. They call ensureIndex before doing any inserts.

Now I want to shard these collections. I intended to make each client call shardCollection before inserting into a new collection. But the clients are not coordinated with one another, so several clients might call shardCollection on the same (new) collection at once. (They will check if the collection exists first, but there's an inevitable race condition.)

The Mongo shardCollection documentation says:

Warning: Do not run more than one shardCollection command on the same collection at the same time.

Does this mean I have to either coordinate the clients, or pre-create collections from a dedicated separate process? (The set of possible collections isn't finite, so pre-creating is hard.)

Or is there a way to make two parallel shardCollection calls safe? I can guarantee that:

  • The multiple calls to shardCollection will be identical (same shard key, etc).
  • Each app will wait for its own call to shardCollection to complete before doing any inserts.
  • Therefore, shardCollection will complete successfully at least once on an empty collection before any documents are inserted.

Finally, the Mongo shell command sh.shardCollection doesn't include the warning above. It's implemented in the Mongo shell, so my driver (reactivemongo) doesn't provide it. Does that mean it includes some logic I should duplicate?

Rationale: my collections are logically partitioned by date and other parameters. That is, the collection name specifies a day and those other parameters. I create each collection I need, and call ensureIndex, before the first insert. This allows me to efficiently drop/backup/restore old collections.

Was it helpful?

Solution

Assuming you pass all the relevant checks (not capped, shard key passes, not a system collection etc.) then if you issue another shardCollection command you should just receive the message that the collection is already sharded (see here). If you guarantee that the commands will be the same (same shard key for each namespace), then you remove at least the competing requests race condition.

The big question is whether or not there might be a problematic race condition whereby the initial shardCollection command has not completed and you issue another identical command and the impact that might have - I think the only thing to do is test and see realistically. You may simply have to implement a check before allowing such a command to be run to avoid the race in the first place.

As for running the command, if the driver has not implemented a helper for you, then they usually implement a way to run a raw command. This is the case with reactivemongo (based on these docs), and if you look at the shell helper code (run without parentheses), you will note that it is just some quick sanity checks on arguments followed by a command call itself:

> sh.shardCollection
function ( fullName , key , unique ) {
sh._checkFullName( fullName )
assert( key , "need a key" )
assert( typeof( key ) == "object" , "key needs to be an object" )

var cmd = { shardCollection : fullName , key : key }
if ( unique )
cmd.unique = true;

return sh._adminCommand( cmd );
}

The string stored in the cmd variable is the piece you want when constructing your command (and note that it is then run against the admin database using the adminCommand helper).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top