Why do BCL Collections use struct enumerators, not classes?

https://stackoverflow.com/questions/3168311

02-10-2019
|

Question

We all know mutable structs are evil in general. I'm also pretty sure that because IEnumerable<T>.GetEnumerator() returns type IEnumerator<T>, the structs are immediately boxed into a reference type, costing more than if they were simply reference types to begin with.

So why, in the BCL generic collections, are all the enumerators mutable structs? Surely there had to have been a good reason. The only thing that occurs to me is that structs can be copied easily, thus preserving the enumerator state at an arbitrary point. But adding a Copy() method to the IEnumerator interface would have been less troublesome, so I don't see this as being a logical justification on its own.

Even if I don't agree with a design decision, I would like to be able to understand the reasoning behind it.

Solution

Indeed, it is for performance reasons. The BCL team did a lot of research on this point before deciding to go with what you rightly call out as a suspicious and dangerous practice: the use of a mutable value type.

You ask why this doesn't cause boxing. It's because the C# compiler does not generate code to box stuff to IEnumerable or IEnumerator in a foreach loop if it can avoid it!

When we see

foreach(X x in c)

the first thing we do is check to see if c has a method called GetEnumerator. If it does, then we check to see whether the type it returns has method MoveNext and property current. If it does, then the foreach loop is generated entirely using direct calls to those methods and properties. Only if "the pattern" cannot be matched do we fall back to looking for the interfaces.

This has two desirable effects.

First, if the collection is, say, a collection of ints, but was written before generic types were invented, then it does not take the boxing penalty of boxing the value of Current to object and then unboxing it to int. If Current is a property that returns an int, we just use it.

Second, if the enumerator is a value type then it does not box the enumerator to IEnumerator.

Like I said, the BCL team did a lot of research on this and discovered that the vast majority of the time, the penalty of allocating and deallocating the enumerator was large enough that it was worth making it a value type, even though doing so can cause some crazy bugs.

For example, consider this:

struct MyHandle : IDisposable { ... }
...
using (MyHandle h = whatever)
{
    h = somethingElse;
}

You would quite rightly expect the attempt to mutate h to fail, and indeed it does. The compiler detects that you are trying to change the value of something that has a pending disposal, and that doing so might cause the object that needs to be disposed to actually not be disposed.

Now suppose you had:

struct MyHandle : IDisposable { ... }
...
using (MyHandle h = whatever)
{
    h.Mutate();
}

What happens here? You might reasonably expect that the compiler would do what it does if h were a readonly field: make a copy, and mutate the copy in order to ensure that the method does not throw away stuff in the value that needs to be disposed.

However, that conflicts with our intuition about what ought to happen here:

using (Enumerator enumtor = whatever)
{
    ...
    enumtor.MoveNext();
    ...
}

We expect that doing a MoveNext inside a using block will move the enumerator to the next one regardless of whether it is a struct or a ref type.

Unfortunately, the C# compiler today has a bug. If you are in this situation we choose which strategy to follow inconsistently. The behaviour today is:

if the value-typed variable being mutated via a method is a normal local then it is mutated normally
but if it is a hoisted local (because it's a closed-over variable of an anonymous function or in an iterator block) then the local is actually generated as a read-only field, and the gear that ensures that mutations happen on a copy takes over.

Unfortunately the spec provides little guidance on this matter. Clearly something is broken because we're doing it inconsistently, but what the right thing to do is not at all clear.

OTHER TIPS

Struct methods are inlined when type of struct is known at compile time, and calling method via interface is slow, so answer is: because of performance reason.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow