ConcurrentDictionary<> performance at a single thread misunderstanding?

Question 1

The most likely reason that ConcurrentDictionary simply has more overhead than Dictionary for the same operation. This is demonstrably true if you dig into the sources

It uses a lock for the indexer
It uses volatile writes
It has to do atomic writes of values which are not guaranteed to be atomic in .Net
It has extra branches in the core add routine (whether to take a lock, do atomic write)

All of these costs are incurred irrespective of the number of threads that it's being used on. These costs may be individually small but aren't free and do add up over time

Question 2

Well, ConcurrentDictionary is allowing for the possibility that it can be used by multiple threads. It seems entirely reasonable to me that that requires more internal housekeeping than something which assumes it can get away without worrying about access from multiple threads. I'd have been very surprised if it had worked out the other way round - if the safer version were always faster too, why would you ever use the less safe version?

Question 3

Update for .NET 5: I'll leave the previous answer up as it is still relevant for older runtimes but .NET 5 appears to have further improved ConcurrentDictionary to the point where reads via TryGetValue() are actually faster than even the normal Dictionary, as seen in the results below (COW is my CopyOnWriteDictionary, detailed below). Make what you will of this :)

|          Method |        Mean |     Error |    StdDev |    Gen 0 |    Gen 1 |    Gen 2 | Allocated |
|---------------- |------------:|----------:|----------:|---------:|---------:|---------:|----------:|
| ConcurrentWrite | 1,372.32 us | 12.752 us | 11.304 us | 226.5625 |  89.8438 |  44.9219 | 1398736 B |
|        COWWrite | 1,077.39 us | 21.435 us | 31.419 us |  56.6406 |  19.5313 |  11.7188 |  868629 B |
|       DictWrite |   347.19 us |  5.875 us |  5.208 us | 124.5117 | 124.5117 | 124.5117 |  673064 B |
|  ConcurrentRead |    63.53 us |  0.486 us |  0.431 us |        - |        - |        - |         - |
|         COWRead |    81.55 us |  0.908 us |  0.805 us |        - |        - |        - |         - |
|        DictRead |    70.71 us |  0.471 us |  0.393 us |        - |        - |        - |         - |

Previous answer, still relevant for < .NET 5:

The latest versions of ConcurrentDictionary have improved significantly since I originally posted this answer. It no longer locks on read and thus offers almost the same performance profile as my CopyOnWriteDictionary implementation with more features so I recommend you use that instead in most cases. ConcurrentDictionary still has 20 - 30% more overhead than Dictionary or CopyOnWriteDictionary, so performance-sensitive applications may still benefit from its use.

You can read about my lock-free thread-safe copy-on-write dictionary implementation here:

http://www.singulink.com/CodeIndex/post/fastest-thread-safe-lock-free-dictionary

It's currently append-only (with the ability to replace values) as it is intended for use as a permanent cache. If you need removal then I suggest using ConcurrentDictionary since adding that into CopyOnWriteDictionary would eliminate all performance gains due to the added locking.

CopyOnWriteDictionary is very fast for quick bursts of writes and lookups usually run at almost standard Dictionary speed without locking. If you write occasionally and read often, this is the fastest option available.

My implementation provides maximum read performance by removing the need for any read locks under normal circumstances while updates aren't being made to the dictionary. The trade-off is that the dictionary needs to be copied and swapped after updates are applied (which is done on a background thread) but if you don't write often or you only write once during initialization then the trade-off is definitely worth it.

Question 4

ConcurrentDictionary vs. Dictionary

In general, use a System.Collections.Concurrent.ConcurrentDictionary in any scenario where you are adding and updating keys or values concurrently from multiple threads. In scenarios that involve frequent updates and relatively few reads, the ConcurrentDictionary generally offers modest benefits. In scenarios that involve many reads and many updates, the ConcurrentDictionary generally is significantly faster on computers that have any number of cores.

In scenarios that involve frequent updates, you can increase the degree of concurrency in the ConcurrentDictionary and then measure to see whether performance increases on computers that have more cores. If you change the concurrency level, avoid global operations as much as possible.

If you are only reading key or values, the Dictionary is faster because no synchronization is required if the dictionary is not being modified by any threads.

Link: https://msdn.microsoft.com/en-us/library/dd997373%28v=vs.110%29.aspx

Question 5

The ConcurrentDictionary<> creates an internal set of locking objects at creation (this is determined by the concurrencyLevel, amongst other factors) - this set of locking objects is used to control access to the internal bucket structures in a series of fine-grained locks.

In a single threaded scenario, there would be no need for the locks, so the extra overhead of acquiring and releasing these locks is probably the source of the difference you're seeing.

Question 6

There is no point in using ConcurrentDictionary in one thread or synchronizing access if all is done in a single thread. Of course dictionary will beat ConcrurrentDictionary.

Much depends on the usage pattern and number of threads. Here is a test, that shows that ConcurrentDictionary outperforms dictionary and lock with thread number increase.

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;

namespace ConsoleApp
{

    class Program
    {

        static void Main(string[] args)
        {
            Run(1, 100000, 10);
            Run(10, 100000, 10);
            Run(100, 100000, 10);
            Run(1000, 100000, 10);
            Console.ReadKey();
        }

        static void Run(int threads, int count, int cycles)
        {
            Console.WriteLine("");
            Console.WriteLine($"Threads: {threads}, items: {count}, cycles:{cycles}");

            var semaphore = new SemaphoreSlim(0, threads);

            var concurrentDictionary = new ConcurrentDictionary<int, string>();

            for (int i = 0; i < threads; i++)
            {
                Thread t = new Thread(() => Run(concurrentDictionary, count, cycles,  semaphore));
                t.Start();
            }

            Thread.Sleep(1000);

            var w = Stopwatch.StartNew();

            semaphore.Release(threads);

            for (int i = 0; i < threads; i++)
                semaphore.Wait();

            Console.WriteLine($"ConcurrentDictionary: {w.Elapsed}");

            var dictionary = new Dictionary<int, string>();
            for (int i = 0; i < threads; i++)
            {
                Thread t = new Thread(() => Run(dictionary, count, cycles, semaphore));
                t.Start();
            }

            Thread.Sleep(1000);

            w.Restart();

            semaphore.Release(threads);


            for (int i = 0; i < threads; i++)
                semaphore.Wait();

            Console.WriteLine($"Dictionary: {w.Elapsed}");

        }

        static void Run(ConcurrentDictionary<int, string> dic, int elements, int cycles, SemaphoreSlim semaphore)
        {
            semaphore.Wait();
            try
            {
                for (int i = 0; i < cycles; i++)
                    for (int j = 0; j < elements; j++)
                    {
                        var x = dic.GetOrAdd(i, x => x.ToString());
                    }
            }
            finally
            {
                semaphore.Release();
            }
        }

        static void Run(Dictionary<int, string> dic, int elements, int cycles, SemaphoreSlim semaphore)
        {
            semaphore.Wait();
            try
            {
                for (int i = 0; i < cycles; i++)
                    for (int j = 0; j < elements; j++)
                        lock (dic)
                        {
                            if (!dic.TryGetValue(i, out string value))
                                dic[i] = i.ToString();
                        }
            }
            finally
            {
                semaphore.Release();
            }
        }
    }
}

Threads: 1, items: 100000, cycles:10 ConcurrentDictionary: 00:00:00.0000499 Dictionary: 00:00:00.0000137

Threads: 10, items: 100000, cycles:10 ConcurrentDictionary: 00:00:00.0497413 Dictionary: 00:00:00.2638265

Threads: 100, items: 100000, cycles:10 ConcurrentDictionary: 00:00:00.2408781 Dictionary: 00:00:02.2257736

Threads: 1000, items: 100000, cycles:10 ConcurrentDictionary: 00:00:01.8196668 Dictionary: 00:00:25.5717232

Question 7

What makes ConcurrentDictionary<,> much slower in a single threaded environment?

The overhead of the machinery required to make it much faster in multi-threaded environments.

My first instinct is that lock(){} will be always slower. but apparently it is not.

A lock is very cheap when uncontested. You can lock a million times per second and your CPU won't even notice, provided that you are doing it from a single thread. What kills performance in multi-threaded programs is contention for locks. When multiple threads are competing fiercely for the same lock, almost all of them have to wait for the lucky one that holds the lock to release it. This is where the ConcurrentDictionary, with its granular locking implementation, shines. And the more concurrency you have (the more processors/cores), the more it shines.

Question 8

Your test is wrong : you must stop the Stopwatch before !

        Stopwatch sw = new Stopwatch();      
        sw.Start();
        var d = new ConcurrentDictionary<int, int>();
        for (int i = 0; i < 1000000; i++) d[i] = 123;
        for (int i = 1000000; i < 2000000; i++) d[i] = 123;
        for (int i = 2000000; i < 3000000; i++) d[i] = 123;
        sw.Stop();
        Console.WriteLine("baseline = " + sw.Elapsed);



        sw.Start();
        var d2 = new Dictionary<int, int>();
        for (int i = 0; i < 1000000; i++) lock (d2) d2[i] = 123;
        for (int i = 1000000; i < 2000000; i++) lock (d2) d2[i] = 123;
        for (int i = 2000000; i < 3000000; i++) lock (d2) d2[i] = 123;
        sw.Stop();
        Console.WriteLine("baseline = " + sw.Elapsed);

        sw.Stop();

--Output :