Question

So I thought I understood c# yield return as being largely the same as pythons yield which I thought that I understood. I thought that the compiler transforms a function into an object with a pointer to where execution should be resumed and when a request for the next value comes along the object runs up to the next yield where it updates the pointer of where to resume execution and returns a value.

In python this works sort of similarly to lazy evaluation in that it produces values as needed but once the values are used once they can be gc'ed if not save in another variable. Trying to iterate over the result of such a function twice returns an empty iterable unless you transform it to a list.

ex.

def y():
    list = [1,2,3,4]

    for i in list:
        yield str(i)

ys = y()
print "first ys:"
print ",".join(ys)
print "second ys:"
print ",".join(ys)

outputs

first ys:
1,2,3,4
second ys:

Until recently I thought the same thing was true for c# but trying it out in dotnetfiddle failed.

http://dotnetfiddle.net/W5Cbv6

using System;
using System.Linq;
using System.Collections.Generic;

public class Program
{
    public static IEnumerable<string> Y()
    {
        var list = new List<string> {"1","2","3","4","5"};
        foreach(var i in list)
        {
            yield return i;
        }
    }

    public static void Main()
    {


        var ys = Y();
        Console.WriteLine("first ys");
        Console.WriteLine(string.Join(",", ys));
        Console.WriteLine("second ys");
        Console.WriteLine(string.Join(",", ys));

    }
}

outputs

first ys
1,2,3,4,5
second ys
1,2,3,4,5

What is happening here? Is it caching the result? It can't be right, otherwise File.ReadLines would blow up on huge files? Is it simply restarting the function from the top a second time?

note: I'm a bit uncertain about some of the terminology of generators and coroutines so I've tried to avoid labelling.

Was it helpful?

Solution

You're very close. An IEnumerable is an object capable of creating an iterator (an IEnumerator). An IEnumerator behaves exactly as you've described.

So the IEnumerable generates generators.

Unless you go out of your way to generate some sort of state shared between the generated iterators, IEnumerator objects won't affect each other, whether they are from separate calls to the iterator block or another IEnumerator generated by the same IEnumerable.

OTHER TIPS

After looking through every part of the code, I believe it has to do with IEnumerable<>. If we look at MSDN, IEnumerable is not a enumerator in itself, but it creates an enumerator every time GetEnumerator() is called. If we look at GetEnumerator, we see that foreach (and I imagine string.Join) calls GetEnumerator(), creating a new state every time it is called. As an example, here's the code again using an enumerator:

using System;
using System.Linq;
using System.Collections.Generic;

public class Program
{
    public static IEnumerable<string> Y()
    {
        var list = new List<string> {"1","2","3","4","5"};
        foreach(var i in list)
        {
            yield return i;
        }
    }
    
    public static void Main()
    {
        
        
        var ys = Y();
        Console.WriteLine("first ys");
        Console.WriteLine(string.Join(",", ys));
        IEnumerator<string> i = ys.GetEnumerator();
        Console.WriteLine(""+i.MoveNext()+": "+i.Current);
        Console.WriteLine(""+i.MoveNext()+": "+i.Current);
        Console.WriteLine(""+i.MoveNext()+": "+i.Current);
        Console.WriteLine(""+i.MoveNext()+": "+i.Current);
        Console.WriteLine(""+i.MoveNext()+": "+i.Current);
        Console.WriteLine(""+i.MoveNext()+": "+i.Current);
    }
}

(dotnetfiddle)

When MoveNext reaches the end, it has the behavior of python as expected.

The reason the code behaves differently in each case is because in python, you are using the same IEnumerator instance twice, but the second time it had already been enumerated (it cannot repeat it, so it does not). However, in C#, each call to GetEnumerator() returns a new IEnumerator, which will reiterate through the collection from the beginning. Each enumerator instance does not affect other enumerators. Enumerators do not implicitly lock the collection, so the two enumerators can both loop through the entire collection. However, your python example only uses one enumerator, so without a reset, it can only iterate

The yield operator is a utility for returning IEnumerable or IEnumerator instances more easily. It implements the interface, adding an element to the returned iterator with each call to yield return. With each call to Y(), a new enumerable is constructed, but each enumerable can have more than one enumerator. Each call to String.Join calls GetEnumerator internally, which creates a new enumerator for each call. Therefore, with each call to String.Join, you loop through the entire collection from start to finish.

When the compiler sees the yield keyword it will implement a state machine in a nested private class inside the Program class. This nested class will implement IEnumerator. (Before C# had the yield keyword, we needed to do this ourselves) This is a slightly simplified and more readable version:

private sealed class EnumeratorWithSomeWeirdName : IEnumerator<string>, IEnumerable<string>
{
private string _current;
private int _state = 0;
private List<string> list_;
private List<string>.Enumerator _wrap;

public string Current
{
    get { return _current; }
}

object IEnumerator.Current
{
    get { return _current; }
}

public bool MoveNext()
{
    switch (_state) {
        case 0:
            _state = -1;
            list_ = new List<string>();
            list_.Add("1");
            list_.Add("2");
            list_.Add("3");
            list_.Add("4");
            list_.Add("5");
            _wrap = list_.GetEnumerator();
            _state = 1;
            break;
        case 1:
            return false;
        case 2:
            _state = 1;
            break;
        default:
            return false;
    }
    if (_wrap.MoveNext()) {
        _current = _wrap.Current;
        _state = 2;
        return true;
    }
    _state = -1;
    return false;
}

IEnumerator<string> GetEnumerator()
{
    return new EnumeratorWithSomeWeirdName();
}

IEnumerator IEnumerator.GetEnumerator()
{
    return new EnumeratorWithSomeWeirdName();
}

void IDisposable.Dispose()
{
    _wrap.Dispose();
}

void IEnumerator.Reset()
{
    throw new NotSupportedException();
}

}

The Y() method will change too. It will simply return an instance of this nested class:

public static IEnumerable<string> Y()
{
    return new EnumeratorWithSomeWeirdName();
}

Notice that nothing happens at this point. You are only getting an instance of this class. Only when you start enumerating (with the foreach loop) the MoveNext() method on the instance will be called. This will yield the items one at a time. (This is important to realize)

The foreach loop is also syntactic sugar; it actually calls GetEnumerator():

using(IEnumerator<string> enumerator = list.GetEnumerator()) {
    while (enumerator.MoveNext()) yield return enumerator.Current;
}

If you call ys.GetEnumerator() you can even see that it has a method MoveNext() and a property Current, just like an IEnumerator should.

If your Main method had a line like:

foreach (string s in ys) Console.WriteLine(s);

and you would step through it with the debugger, you would see the debugger jumping back and forth between the Main and Y methods. It is normally impossible to go in and out of a method like this, but because in reality it is actually a class, this works. (Because string.Join simply enumerates the whole thing, your example would not show this.)

Now, every time you call

Console.WriteLine(string.Join(",", ys));

another foreach loop is called, so another Enumerator is created. This is possible because the inner class also implements IEnumerable (They just thought of everything when they implemented the yield keyword) So there's a lot of compiler magic going on. One line with a yield return turns into an entire class.

The compiler creates an object which implements IEnumerable of your Y-method.

This object is basically a state machine which keeps track of the current state of the object while the enumerator is moved forward. Look at the IL of the MoveNext-method of the Enumerator created by IEnumerable returned from your Y-method:

        IL_0000: ldarg.0
        IL_0001: ldfld int32 Program/'<Y>d__1'::'<>1__state'
        IL_0006: stloc.1
        IL_0007: ldloc.1
        IL_0008: switch (IL_001e, IL_00e8, IL_00ce)

        IL_0019: br IL_00e8

        IL_001e: ldarg.0
        IL_001f: ldc.i4.m1
        IL_0020: stfld int32 Program/'<Y>d__1'::'<>1__state'
        IL_0025: ldarg.0
        IL_0026: ldarg.0
        IL_0027: newobj instance void class [mscorlib]System.Collections.Generic.List`1<string>::.ctor()
        IL_002c: stfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<>g__initLocal0'
        IL_0031: ldarg.0
        IL_0032: ldfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<>g__initLocal0'
        IL_0037: ldstr "1"
        IL_003c: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<string>::Add(!0)
        IL_0041: ldarg.0
        IL_0042: ldfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<>g__initLocal0'
        IL_0047: ldstr "2"
        IL_004c: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<string>::Add(!0)
        IL_0051: ldarg.0
        IL_0052: ldfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<>g__initLocal0'
        IL_0057: ldstr "3"
        IL_005c: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<string>::Add(!0)
        IL_0061: ldarg.0
        IL_0062: ldfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<>g__initLocal0'
        IL_0067: ldstr "4"
        IL_006c: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<string>::Add(!0)
        IL_0071: ldarg.0
        IL_0072: ldfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<>g__initLocal0'
        IL_0077: ldstr "5"
        IL_007c: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<string>::Add(!0)
        IL_0081: ldarg.0
        IL_0082: ldfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<>g__initLocal0'
        IL_0087: stfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<list>5__2'
        IL_008c: ldarg.0
        IL_008d: ldarg.0
        IL_008e: ldfld class [mscorlib]System.Collections.Generic.List`1<string> Program/'<Y>d__1'::'<list>5__2'
        IL_0093: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0> class [mscorlib]System.Collections.Generic.List`1<string>::GetEnumerator()
        IL_0098: stfld valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<string> Program/'<Y>d__1'::'<>7__wrap4'
        IL_009d: ldarg.0
        IL_009e: ldc.i4.1
        IL_009f: stfld int32 Program/'<Y>d__1'::'<>1__state'
        IL_00a4: br.s IL_00d5

        IL_00a6: ldarg.0
        IL_00a7: ldarg.0
        IL_00a8: ldflda valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<string> Program/'<Y>d__1'::'<>7__wrap4'
        IL_00ad: call instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<string>::get_Current()
        IL_00b2: stfld string Program/'<Y>d__1'::'<i>5__3'
        IL_00b7: ldarg.0
        IL_00b8: ldarg.0
        IL_00b9: ldfld string Program/'<Y>d__1'::'<i>5__3'
        IL_00be: stfld string Program/'<Y>d__1'::'<>2__current'
        IL_00c3: ldarg.0
        IL_00c4: ldc.i4.2
        IL_00c5: stfld int32 Program/'<Y>d__1'::'<>1__state'
        IL_00ca: ldc.i4.1
        IL_00cb: stloc.0
        IL_00cc: leave.s IL_00f3

        IL_00ce: ldarg.0
        IL_00cf: ldc.i4.1
        IL_00d0: stfld int32 Program/'<Y>d__1'::'<>1__state'

        IL_00d5: ldarg.0
        IL_00d6: ldflda valuetype        [mscorlib]System.Collections.Generic.List`1/Enumerator<string> Program/'<Y>d__1'::'<>7__wrap4'
        IL_00db: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<string>::MoveNext()
        IL_00e0: brtrue.s IL_00a6

        IL_00e2: ldarg.0
        IL_00e3: call instance void Program/'<Y>d__1'::'<>m__Finally5'()

        IL_00e8: ldc.i4.0
        IL_00e9: stloc.0
        IL_00ea: leave.s IL_00f3

When the Enumerator-object is in it's intial state (it's just been new'ed up by the GetEnumerator-call) the method creates an internal list containing all the yielded values. Subsequent calls to MoveNext operates on the internal list until it's exhausted. This means that every time someone start iterating over the returned IEnumerable a new Enumerator is created and you start all over.

The same happens with File.ReadLines. Every time you start iterating a new file handle is created returning one line from the underlying stream for every call to MoveNext/Current

I don't know about Python, but in C# the yield keyword is essentially an auto-implemented iterator object using the code "surrounding" the yield statements as the iterator logic.

The compiler emits objects that implement the IEnumerable<T> and IEnumerator<T> interfaces.

IEnumerable says that an object can be enumerated and provides the GetEnumerator() method. Any code that consumes an IEnumerable object calls the GetEnumerator() method at some point.

The call to GetEnumerator() method returns an object that implements the IEnumerator interface. IEnumerator is the implementation of iterator pattern in C#/CLR and it is this iterator object (not the IEnumerable one) the holds the state of the enumeration, i.e. the object implementing the IEnumerator interface is a finite-state machine (FSM, finite-state automaton). The yield return and yield break keywords represent a state transfer within this FSM.

So the thing that is happening in you example code is this - multiple calls to you Y() method return new instances of IEnumerator containing your logic and each of these instances has its own state so that enumerating over them is independent of each other.

I hope I've written it in a way that makes sense and clarifies the issue for you.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top