Question

I'm running on a 32-bit machine and I'm able to confirm that long values can tear using the following code snippet which hits very quickly.

        static void TestTearingLong()
        {
            System.Threading.Thread A = new System.Threading.Thread(ThreadA);
            A.Start();

            System.Threading.Thread B = new System.Threading.Thread(ThreadB);
            B.Start();
        }

        static ulong s_x;

        static void ThreadA()
        {
            int i = 0;
            while (true)
            {
                s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL;
                i++;
            }
        }

        static void ThreadB()
        {
            while (true)
            {
                ulong x = s_x;
                Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL);
            }
        }

But when I try something similar with doubles, I'm not able to get any tearing. Does anyone know why? As far as I can tell from the spec, only assignment to a float is atomic. The assignment to a double should have a risk of tearing.

    static double s_x;

    static void TestTearingDouble()
    {
        System.Threading.Thread A = new System.Threading.Thread(ThreadA);
        A.Start();

        System.Threading.Thread B = new System.Threading.Thread(ThreadB);
        B.Start();
    }

    static void ThreadA()
    {
        long i = 0;

        while (true)
        {
            s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
            i++;

            if (i % 10000000 == 0)
            {
                Console.Out.WriteLine("i = " + i);
            }
        }
    }

    static void ThreadB()
    {
        while (true)
        {
            double x = s_x;

            System.Diagnostics.Debug.Assert(x == 0.0 || x == double.MaxValue);
        }
    }
Was it helpful?

Solution

static double s_x;

It is much harder to demonstrate the effect when you use a double. The CPU uses dedicated instructions to load and store a double, respectively FLD and FSTP. It is much easier with long since there is no single instruction that load/stores a 64-bit integer in 32-bit mode. To observe it you need to have the variable's address misaligned so it straddles the cpu cache line boundary.

That will never happen with the declaration you used, the JIT compiler ensures that the double is aligned properly, stored at an address that's a multiple of 8. You could store it in a field of a class, the GC allocator only aligns to 4 in 32-bit mode. But that's a crap shoot.

Best way to do it is by intentionally mis-aligning the double by using a pointer. Put unsafe in front of the Program class and make it look similar to this:

    static double* s_x;

    static void Main(string[] args) {
        var mem = Marshal.AllocCoTaskMem(100);
        s_x = (double*)((long)(mem) + 28);
        TestTearingDouble();
    }
ThreadA:
            *s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
ThreadB:
            double x = *s_x;

This still won't guarantee a good misalignment (hehe) since there's no way to control exactly where AllocCoTaskMem() will align the allocation relative to the start of the cpu cache line. And it depends on the cache associativity in your cpu core (mine is a Core i5). You'll have to tinker with the offset, I got the value 28 by experimentation. The value should be divisible by 4 but not by 8 to truly simulate the GC heap behavior. Keep adding 8 to the value until you get the double to straddle the cache line and trigger the assert.

To make it less artificial you'll have to write a program that stores the double in field of a class and get the garbage collector to move it around in memory so it gets misaligned. Kinda hard to come up with a sample program that ensures this happens.

Also note how your program can demonstrate a problem called false sharing. Comment out the Start() method call for thread B and note how much faster thread A runs. You are seeing the cost of the cpu keeping the cache line consistent between the cpu cores. Sharing is intended here since the threads access the same variable. Real false sharing happens when threads access different variables that are stored in the same cache line. This is otherwise why alignment matters, you can only observe the tearing for a double when part of it is in one cache line and part of it is in another.

OTHER TIPS

As strange as it sounds, that depends on your CPU. While doubles are not guaranteed not to tear, they won't on many current processors. Try an AMD Sempron if you want tearing in this situation.

EDIT: Learned that the hard way a few years ago.

Doing some digging, I've found some interesting reads concerning floating-point operations on x86 architectures:

According to Wikipedia, the x86 floating-point unit stored floating-point values in 80-bit registers:

[...] subsequent x86 processors then integrated this x87 functionality on chip which made the x87 instructions a de facto integral part of the x86 instruction set. Each x87 register, known as ST(0) through ST(7), is 80 bits wide and stores numbers in the IEEE floating-point standard double extended precision format.

Also this other SO question is related: Some floating point precision and numeric limits question

This could explain why, although doubles are 64-bits, they are operated on atomically.

For what its worth this topic and code sample can be found here.

http://msdn.microsoft.com/en-us/magazine/cc817398.aspx

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top