Question

I'm reading an article about the recently released Gizzard sharding framework by twitter (http://engineering.twitter.com/2010/04/introducing-gizzard-framework-for.html). It mentions that all write operations must be idempotent to ensure high reliability.

According to wikipedia, "Idempotent operations are operations that can be applied multiple times without changing the result." But, IMHO, in the Gizzard case, idempotent write operations should be ones in which sequence doesn't matter.

Now, my question is: How to do I make write operations idempotent?

The only thing I can imagine is to have a version number attached to each write. For example, in a blog system, each blog must have a $blog_id and $content. At the application level, we always write blog content like this write($blog_id, $content, $version). The $version is determined to be unique at the application level. So, if an application first tries to set one blog to "Hello world" and second want's it to be "Goodbye", then write is idempotent. We have such two write operations:

write($blog_id, "Hello world", 1);
write($blog_id, "Goodbye", 2);

These two operations are supposed to changed two different records in the DB. So, no matter how many times and what sequence these two operations are executed, the results are the same.

This is just my understanding. Please correct me if I'm wrong.

Was it helpful?

Solution

You are absolutely right. Idempotent operations by themselves can provide only one conflict resolution pattern - "Last write wins". It is a possible solution if your writes cannot be reordered in time. If they can, you should provide additional information to make conflict resolution automatic. And your idea is not new. In the general case it is called vector clocks.

We use version based conflict resolution in one of our systems which collect the change history of objects in our system. Clients send the full object state and version information to a history module (asynchronously). The history module then can reorder the object states in the correct manner and save only the delta in persistent storage. The only restriction is that the client should use some sort of concurrency control when making changes to the object (optimistic locking is very good method if you track object state version).

OTHER TIPS

You've got the right idea. Setting a particular value is idempotent, because if you carry out that operation more than once, you have the same result. The classic non-idempotent write is an append, because repetition would lead to multiple copies being appended.

Also, see this previous stackoverflow question.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top