Question

I have a sharded collection of documents in mongodb and several application servers accessing it.

Each application contributes new documents and eventually needs to remove some as well.

It doesn't matter which documents are removed, but it's critical that it removes (claims) an exact number, and that no other application is removing (claiming) the same document(s).

My idea is:

unique = makeUniqueValue()
docs = []

for (i = 0;i < 10;i++) {
    r = findAndModify( claim: false, $set: { claim: unique });
    if (r.value) docs.push(r);
}

if (docs.length < 10)
    "release all docs by updating (claim: false) and try again in some time"

One potential problem with this solution is that given too many applications (and few docs), they would just keep claiming some documents and releasing them again.

What is the well-known and well-tested solution to this problem?

Are "update" and "findAndModify" guaranteeing, that the updated document match the query before the update?

Or could another application "steal" it between matching and updating and thus both application thinks they've claimed the document?

Was it helpful?

Solution

Once the update is running on that document it will ensure that the query matches the document and that it is the latest version.

No other program should be able to steal on a per document basis.

To explain a bit further since I realise this answer is kind of bare: MongoDB has a writer greedy read/write lock on a database level.

This means that findAndModify would not be able to find something while a write operation is given ability to run. So, it can't find a document that is about to be updated as claimed in another thread/application for example.

So this code immediately isolates claiming of documents to one application since each iteration of the loop by another application will result in unclaimed documents and never an in-between/partial state on the MongoDB server.

When actually updating it doesn't matter since you know those documents are the documents you need to update, however, operators like $set etc are run in sequence on a single document as such update operations themselves cannot take partial document state either, they either take claim false or nothing. The update will also pick the rows directly from the data files not from a static result set written out.

If you were to update using the _id or another static piece of data then it would be different.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top