Question

lWe all know that when we want to find one item in a collection based on some key value that Dictionary/Hashset etc are the fastest options available in C#. But, obviously they rely on setting up buckets and calling a hash function on any key value used as an argument for a lookup - both of which have some overhead.

So by rights this must mean that - for a collection up to a certain size - looping over each item in a list/array looking for a match by "brute force" must be quicker (i.e. the List.Contains method)

There's an article at http://www.dotnetperls.com/dictionary-time that suggests this threshhold is three items. Frankly I'm surprised that a Dictionary performs better with so few items!

I'm curious as to whether any of you out there have done your own benchmarks and can verify this. I'm also curious about the time required to instantiate the Dictionary and List - which the article above has left out (and frankly in most of the insert-light/read-heavy situations we'd use a dictionary for it's probably irrelevant - but in some cases this could be an important factor in deciding which to use).

Also: if this is the case (and a Dictionary really is a better choice than a List with four or more values) then why is it so? The example benchmarked in the article uses string keys - is there a much bigger performance cost to the default string equality operator/IEquatable implementation than I realise? Does a Dictionary always call on the key's IEquatable implementation during a lookup - or only in the case of a hash collision?

And finally: would this threshhold of three items be much different if the type of the key were something with a simpler equality test (like an Int32/Int64/Guid)?

No correct solution

OTHER TIPS

The ListDictionary class is provided for the very reason you mention, and it's described here, and the suggestion is:

Recommended for collections that typically include fewer than 10 items.

Microsoft also provide a HybridDictionary described here, to allow you to get the best of both worlds. It describes its typical usage as follows:

This class is recommended for cases where the number of elements in a dictionary is unknown. It takes advantage of the improved performance of a ListDictionary with small collections, and offers the flexibility of switching to a Hashtable which handles larger collections better than ListDictionary.

As for your specific case, the only way to see which performs best is to benchmark.

(Note that the examples above are for information purposes only! You will generally be much better off using the new .NET generic collections...)

The reason your article probably doesn't go into detail about the costs of setting up the dictionary/list is that it's largely trivial. For that matter, if you're going to do a single look up in a data structure it really doesn't matter how you implement it because it will take a minuscule amount of time.

What we care about are the accesses because generally we are going to access a data structure a number of times and the effect of those repeated accesses will greatly outweigh any gains in setup time.

In regards to why a list is slower even with so few items: It's because that's not what lists are designed for. The key here is that computation is generally much faster than memory access. If you're looking for a specific thing in your data structure, having an algorithm that tells you where to look (a hash function) with minimal memory access lets you speed things up quite noticeably. If you need to access items sequentially, as is often the case with strings, then a list is what you want.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top