C#/CLR: MemoryBarrier and torn reads

Question 1

I took a long hard look at your code and it does appear correct to me. One thing that immediately jumped out at me was that you used an established pattern for performing the low-lock operation. I can see that you are using version as a kind of virtual lock. Even numbers are released and odd numbers are acquired. And since you are using a monotonically increasing value for the virtual lock you are also avoiding the ABA problem. The most important thing, however, is that you continue to loop while attempting to read until the virtual lock value is observed to be the same before the read starts as compared to after it completes. Otherwise, you consider this a failed read and try it all over again. So yeah, job well done on the core logic.

So what about the placement of the memory barrier generators? Well, this all looks pretty good as well. All of the Thread.MemoryBarrier calls are required. If I had to nit-pick I would say you need one additional one in the Write method so that it looks like this.

public void Write(T value)
{
    // locks are full barriers
    lock (write)
    {
        ++version;             // ++version odd: write in progress
        Thread.MemoryBarrier();
        this.value = value;
        Thread.MemoryBarrier();
        ++version;             // ++version even: write complete
    }
}

The added call here ensures that ++version and this.value = value do not get swapped. Now, the ECMA specification technically allows that kind of instruction reordering. However, Microsoft's implementation of the CLI and the x86 hardware both already have volatile semantics on writes so it would not really be needed in most cases. But, who knows, maybe it would be necessary on the Mono runtime targeting the ARM cpu.

On the Read side of things I can find no faults. In fact, the placement of the calls you have is exactly where I would have put them. Some people may wonder why you do not need one before the initial read of version. The reason is because the outer loop will catch the case when the first read was cached because of the Thread.MemoryBarrier further down.

So this brings me to a discussion about performance. Is this really faster than taking a hard lock in the Read method? Well, I did some pretty extensive testing of your code to help answer that. The answer is a definitive yes! This is quite a bit faster than taking a hard lock. I tested using a Guid as the value type because it is 128 bits and so that is bigger than the native word size of my machine (64 bits). I also used several different variations on the number of writers and readers. Your low lock technique consistently and significantly outperformed the hard lock technique. I even tried a few variations using Interlocked.CompareExchange to do the guarded read and they were all slower as well. In fact, in some situations it was actually slower than taking the hard lock. I have to be honest. I was not at all surprised by this.

I also did some pretty significant validity testing. I created tests that would run for quite some time and not once did I see a torn read. And then as a control test I would tweak the Read method in such a manner that I knew it would be incorrect and I ran the test again. This time, as expected, torn reads started to appear randomly. I switched the code back to what you have and the torn reads disappeared; again, as expected. This seemed to confirm what I already expected. That is, your code looks correct. I do not have a wide variety of runtime and hardware environments to test with (nor do I have the time) so I am not willing to give it a 100% seal of approval, but I do think I can give your implementation two thumbs up for now.

Finally, with all that said, I would still avoid putting this in production. Yeah, it may be correct, but the next guy who has to maintain the code is probably not going to understand it. Someone may change the code and break it because they do not understand the consequences of their changes. You have to admit that this code is pretty brittle. Even the slightest change could break it.

Question 2

It seems like you're interesting in lock-free /wait-free implementation. Let start from this discussion, for example: Lock-free multi-threading is for real threading experts