Interlocked.Read / Interlocked.Exchange much slower on Mono than .NET?

https://stackoverflow.com/questions/9179461

26-04-2021
|

Вопрос

Sorry for the long question, but there's a Jon Skeet reference, so it may be worthwhile for some.

In short:
Interlocked.Read / Interlocked.Exchange seem to perform much slower while running in the Mono framework than while running in the .NET framework. I'm curious to know why.

In long:
I wanted a thread-safe double for 32-bit platforms, so I made this struct:

public interface IThreadSafeDouble
{
    double Value { get; set; }
}

public struct LockedThreadSafeDouble : IThreadSafeDouble
{
    private readonly object Locker;
    private double _Value;

    public double Value
    {
        get { lock (Locker) return _Value; }
        set { lock (Locker) _Value = value; }
    }

    public LockedThreadSafeDouble(object init)
        : this()
    {
        Locker = new object();
    }
}

Then I read Jon Skeet's answer to this question, so I made this struct:

public struct InterlockedThreadSafeDouble : IThreadSafeDouble
{
    private long _Value;

    public double Value
    {
        get { return BitConverter.Int64BitsToDouble(Interlocked.Read(ref _Value)); }
        set { Interlocked.Exchange(ref _Value, BitConverter.DoubleToInt64Bits(value)); }
    }
}

Then I wrote this test:

    private static TimeSpan ThreadSafeDoubleTest2(IThreadSafeDouble dbl)
    {
        var incrementTarg = 10000000;
        var sw = new Stopwatch();
        sw.Start();
        for (var i = 0; i < incrementTarg; i++, dbl.Value++);
        sw.Stop();
        return sw.Elapsed;
    }

    private static void ThreadSafeTest()
    {
        var interlockedDbl = new InterlockedThreadSafeDouble();
        var interlockedTim = ThreadSafeDoubleTest2(interlockedDbl);

        var lockedDbl = new LockedThreadSafeDouble(true);
        var lockedTim = ThreadSafeDoubleTest2(lockedDbl);

        System.Console.WriteLine("Interlocked Time: " + interlockedTim);
        System.Console.WriteLine("Locked Time:      " + lockedTim);
    }       

    public static void Main(string[] args)
    {
        for (var i = 0; i < 5; i++)
        {
            System.Console.WriteLine("Test #" + (i + 1));
            ThreadSafeTest();
        }
        System.Console.WriteLine("Done testing.");
        System.Console.ReadLine();
    }

And I got this result using the .NET framework: .NET Interlocked test results

And this result using the Mono framework: Mono Interlocked test results

I've ran both tests multiple times on the same machine (Windows XP) and the results are consistent. I'm curious to know why Interlocked.Read/Interlocked.Exchange seems to perform so much slower on the Mono framework.

Update:

I wrote the following, simpler test:

long val = 1;
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 100000000; i++) {
    Interlocked.Exchange(ref val, 2);
    // Interlocked.Read(ref val);
}
sw.Stop();
System.Console.WriteLine("Time: " + sw.Elapsed);

The .NET framework consistently returns ~2.5 seconds with both Exchange and Read. The Mono framework returns ~5.1 seconds.

Решение

Drawing performance conclusions is not so easy. In the first example long<->double conversion can be important factor. By changing all doubles to longs (and removing conversions) these are my times on 32bit Mono in Windows:

Test #1
Interlocked Time: 00:00:01.2548628
Locked Time:      00:00:01.7281594
Test #2
Interlocked Time: 00:00:01.2466018
Locked Time:      00:00:01.7219013
Test #3
Interlocked Time: 00:00:01.2590181
Locked Time:      00:00:01.7443508
Test #4
Interlocked Time: 00:00:01.2575325
Locked Time:      00:00:01.7309012
Test #5
Interlocked Time: 00:00:01.2593490
Locked Time:      00:00:01.7528010
Done testing.

So the Interlocked implementation wasn't the biggest factor here.

But then you have second example with no conversions. Why that happens? I think the answer is loop unrolling, done better in .NET JIT compiler. But that is just a guess. If you want to compare interlocked performance in real life scenario, you have (at least) two options:

Compare them in the real life scenario.
Compare the machine code emitted by JIT compilers and see the exact implementation of Interlocked.

Also note that the only guarantee given by the above implementation is that you won't observe tearing. For instance it does not give you (the usually needed) guarantee that if two threads are incrementing value, the sum will be correct (i.e. it will take all incrementations into account).

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow