Why can't IEnumerator's be cloned?

https://stackoverflow.com/questions/807991

03-07-2019
|

Question

In implementing a basic Scheme interpreter in C# I discovered, to my horror, the following problem:

IEnumerator doesn't have a clone method! (or more precisely, IEnumerable can't provide me with a "cloneable" enumerator).

What I'd like:

interface IEnumerator<T>
{
    bool MoveNext();
    T Current { get; }
    void Reset();
    // NEW!
    IEnumerator<T> Clone();
}

I cannot come up with an implementation of IEnumerable that would not be able to supply an efficiently cloneable IEnumerator (vectors, linked lists, etc. all would be able to provide a trivial implementation of IEnumerator's Clone() as specified above... it would be easier than providing a Reset() method anyway!).

The absence of the Clone method means that any functional/recursive idiom of enumerating over a sequence won't work.

It also means I can't "seamlessly" make IEnumerable's behave like Lisp "lists" (for which you use car/cdr to enumerate recursively). i.e. the only implemention of "(cdr some IEnumerable)" would be woefully inefficient.

Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?

Can anyone suggest a workaround?

Solution

The logic is inexorable! IEnumerable doesn't support Clone, and you need Clone, so you shouldn't be using IEnumerable.

Or more accurately, you shouldn't be using it as the fundamental basis for work on a Scheme interpreter. Why not make a trivial immutable linked list instead?

public class Link<TValue>
{
    private readonly TValue value;
    private readonly Link<TValue> next;

    public Link(TValue value, Link<TValue> next)
    {
        this.value = value;
        this.next = next;
    } 

    public TValue Value 
    { 
        get { return value; }
    }

    public Link<TValue> Next 
    {
        get { return next; }
    }

    public IEnumerable<TValue> ToEnumerable()
    {
        for (Link<TValue> v = this; v != null; v = v.next)
            yield return v.value;
    }
}

Note that the ToEnumerable method gives you convenient usage in the standard C# way.

To answer your question:

Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?

An IEnumerable can go anywhere in the world for its data. Here's an example that reads lines from the console:

IEnumerable<string> GetConsoleLines()
{
    for (; ;)
        yield return Console.ReadLine();
}

There are two problems with this: firstly, a Clone function would not be particularly straightforward to write (and Reset would be meaningless). Secondly, the sequence is infinite - which is perfectly allowable. Sequences are lazy.

Another example:

IEnumerable<int> GetIntegers()
{
    for (int n = 0; ; n++)
        yield return n;
}

For both these examples, the "workaround" you've accepted would not be much use, because it would just exhaust the available memory or hang up forever. But these are perfectly valid examples of sequences.

To understand C# and F# sequences, you need to look at lists in Haskell, not lists in Scheme.

In case you think the infinite stuff is a red herring, how about reading the bytes from a socket:

IEnumerable<byte> GetSocketBytes(Socket s)
{
    byte[] buffer = new bytes[100];
    for (;;)
    {
        int r = s.Receive(buffer);
        if (r == 0)
            yield break;

        for (int n = 0; n < r; n++)
            yield return buffer[n];       
    }
}

If there is some number of bytes being sent down the socket, this will not be an infinite sequence. And yet writing Clone for it would be very difficult. How would the compiler generate the IEnumerable implementation to do it automatically?

As soon as there was a Clone created, both instances would now have to work from a buffer system that they shared. It's possible, but in practice it isn't needed - this isn't how these kinds of sequences are designed to be used. You treat them purely "functionally", like values, applying filters to them recursively, rather than "imperatively" remembering a location within the sequence. It's a little cleaner than low-level car/cdr manipulation.

Further question:

I wonder, what's the lowest level "primitive(s)" I would need such that anything I might want to do with an IEnumerable in my Scheme interpreter could be implemented in scheme rather than as a builtin.

The short answer I think would be to look in Abelson and Sussman and in particular the part about streams. IEnumerable is a stream, not a list. And they describe how you need special versions of map, filter, accumulate, etc. to work with them. They also get onto the idea of unifying lists and streams in section 4.2.

OTHER TIPS

As a workaround, you could easily make an extension method for IEnumerator which did your cloning. Just create a list from the enumerator, and use the elements as members.

You'd lose the streaming capabilities of an enumerator, though - since you're new "clone" would cause the first enumerator to fully evaluate.

If you can let the original enumerator go, ie. not use it any more, you can implement a "clone" function that takes the original enumerator, and uses it as the source for one or more enumerators.

In other words, you could build something like this:

IEnumerable<String> original = GetOriginalEnumerable();
IEnumerator<String>[] newOnes = original.GetEnumerator().AlmostClone(2);
                                                         ^- extension method
                                                         produce 2
                                                         new enumerators

These could internally share the original enumerator, and a linked list, to keep track of the enumerated values.

This would allow for:

Infinite sequences, as long as both enumerators progress forward (the linked list would be written such that once both enumerators have passed a specific point, those can be GC'ed)
Lazy enumeration, the first of the two enumerators that need a value that hasn't been retrieved from the original enumerator yet, it would obtain it and store it into the linked list before yielding it

Problem here is of course that it would still require a lot of memory if one of the enumerators move far ahead of the other one.

Here is the source code. If you use Subversion, you can download the Visual Studio 2008 solution file with a class library with the code below, as well as a separate unit test porject.

Repository: http://vkarlsen.serveftp.com:81/svnStackOverflow/SO847655
Username and password is both 'guest', without the quotes.

Note that this code is not thread-safe, at all.

public static class EnumeratorExtensions
{
    /// <summary>
    /// "Clones" the specified <see cref="IEnumerator{T}"/> by wrapping it inside N new
    /// <see cref="IEnumerator{T}"/> instances, each can be advanced separately.
    /// See remarks for more information.
    /// </summary>
    /// <typeparam name="T">
    /// The type of elements the <paramref name="enumerator"/> produces.
    /// </typeparam>
    /// <param name="enumerator">
    /// The <see cref="IEnumerator{T}"/> to "clone".
    /// </param>
    /// <param name="clones">
    /// The number of "clones" to produce.
    /// </param>
    /// <returns>
    /// An array of "cloned" <see cref="IEnumerator[T}"/> instances.
    /// </returns>
    /// <remarks>
    /// <para>The cloning process works by producing N new <see cref="IEnumerator{T}"/> instances.</para>
    /// <para>Each <see cref="IEnumerator{T}"/> instance can be advanced separately, over the same
    /// items.</para>
    /// <para>The original <paramref name="enumerator"/> will be lazily evaluated on demand.</para>
    /// <para>If one enumerator advances far beyond the others, the items it has produced will be kept
    /// in memory until all cloned enumerators advanced past them, or they are disposed of.</para>
    /// </remarks>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="enumerator"/> is <c>null</c>.</para>
    /// </exception>
    /// <exception cref="ArgumentOutOfRangeException">
    /// <para><paramref name="clones"/> is less than 2.</para>
    /// </exception>
    public static IEnumerator<T>[] Clone<T>(this IEnumerator<T> enumerator, Int32 clones)
    {
        #region Parameter Validation

        if (Object.ReferenceEquals(null, enumerator))
            throw new ArgumentNullException("enumerator");
        if (clones < 2)
            throw new ArgumentOutOfRangeException("clones");

        #endregion

        ClonedEnumerator<T>.EnumeratorWrapper wrapper = new ClonedEnumerator<T>.EnumeratorWrapper
        {
            Enumerator = enumerator,
            Clones = clones
        };
        ClonedEnumerator<T>.Node node = new ClonedEnumerator<T>.Node
        {
            Value = enumerator.Current,
            Next = null
        };

        IEnumerator<T>[] result = new IEnumerator<T>[clones];
        for (Int32 index = 0; index < clones; index++)
            result[index] = new ClonedEnumerator<T>(wrapper, node);
        return result;
    }
}

internal class ClonedEnumerator<T> : IEnumerator<T>, IDisposable
{
    public class EnumeratorWrapper
    {
        public Int32 Clones { get; set; }
        public IEnumerator<T> Enumerator { get; set; }
    }

    public class Node
    {
        public T Value { get; set; }
        public Node Next { get; set; }
    }

    private Node _Node;
    private EnumeratorWrapper _Enumerator;

    public ClonedEnumerator(EnumeratorWrapper enumerator, Node firstNode)
    {
        _Enumerator = enumerator;
        _Node = firstNode;
    }

    public void Dispose()
    {
        _Enumerator.Clones--;
        if (_Enumerator.Clones == 0)
        {
            _Enumerator.Enumerator.Dispose();
            _Enumerator.Enumerator = null;
        }
    }

    public T Current
    {
        get
        {
            return _Node.Value;
        }
    }

    Object System.Collections.IEnumerator.Current
    {
        get
        {
            return Current;
        }
    }

    public Boolean MoveNext()
    {
        if (_Node.Next != null)
        {
            _Node = _Node.Next;
            return true;
        }

        if (_Enumerator.Enumerator.MoveNext())
        {
            _Node.Next = new Node
            {
                Value = _Enumerator.Enumerator.Current,
                Next = null
            };
            _Node = _Node.Next;
            return true;
        }

        return false;
    }

    public void Reset()
    {
        throw new NotImplementedException();
    }
}

This uses reflection to create a new instance and then sets the values on the new instance. I also found this chapter from C# in Depth to be very useful. Iterator block implementation details: auto-generated state machines

static void Main()
{
    var counter = new CountingClass();
    var firstIterator = counter.CountingEnumerator();
    Console.WriteLine("First list");
    firstIterator.MoveNext();
    Console.WriteLine(firstIterator.Current);

    Console.WriteLine("First list cloned");
    var secondIterator = EnumeratorCloner.Clone(firstIterator);

    Console.WriteLine("Second list");
    secondIterator.MoveNext();
    Console.WriteLine(secondIterator.Current);
    secondIterator.MoveNext();
    Console.WriteLine(secondIterator.Current);
    secondIterator.MoveNext();
    Console.WriteLine(secondIterator.Current);

    Console.WriteLine("First list");
    firstIterator.MoveNext();
    Console.WriteLine(firstIterator.Current);
    firstIterator.MoveNext();
    Console.WriteLine(firstIterator.Current);
}

public class CountingClass
{
    public IEnumerator<int> CountingEnumerator()
    {
        int i = 1;
        while (true)
        {
            yield return i;
            i++;
        }
    }
}

public static class EnumeratorCloner
{
    public static T Clone<T>(T source) where T : class, IEnumerator
    {
        var sourceType = source.GetType().UnderlyingSystemType;
        var sourceTypeConstructor = sourceType.GetConstructor(new Type[] { typeof(Int32) });
        var newInstance = sourceTypeConstructor.Invoke(new object[] { -2 }) as T;

        var nonPublicFields = source.GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Instance);
        var publicFields = source.GetType().GetFields(BindingFlags.Public | BindingFlags.Instance);
        foreach (var field in nonPublicFields)
        {
            var value = field.GetValue(source);
            field.SetValue(newInstance, value);
        }
        foreach (var field in publicFields)
        {
            var value = field.GetValue(source);
            field.SetValue(newInstance, value);
        }
        return newInstance;
    }
}

This answer was also used on the following question Is it possible to clone an IEnumerable instance, saving a copy of the iteration state?

Why not this as an extension method:

public static IEnumerator<T> Clone(this IEnumerator<T> original)
{
    foreach(var v in original)
        yield return v;
}

This would basically create and return a new enumerator without fully evaluating the original.

Edit: Yep, I misread. Paul is correct, this would only work with IEnumerable.

This might help. It needs some code to call the Dispose() on the IEnumerator:

class Program
{
    static void Main(string[] args)
    {
        //var list = MyClass.DequeueAll().ToList();
        //var list2 = MyClass.DequeueAll().ToList();

        var clonable = MyClass.DequeueAll().ToClonable();


        var list = clonable.Clone().ToList();
        var list2 = clonable.Clone()ToList();
        var list3 = clonable.Clone()ToList();
    }
}

class MyClass
{
    static Queue<string> list = new Queue<string>();

    static MyClass()
    {
        list.Enqueue("one");
        list.Enqueue("two");
        list.Enqueue("three");
        list.Enqueue("four");
        list.Enqueue("five");
    }

    public static IEnumerable<string> DequeueAll()
    {
        while (list.Count > 0)
            yield return list.Dequeue();
    }
}

static class Extensions
{
    public static IClonableEnumerable<T> ToClonable<T>(this IEnumerable<T> e)
    {
        return new ClonableEnumerable<T>(e);
    }
}

class ClonableEnumerable<T> : IClonableEnumerable<T>
{
    List<T> items = new List<T>();
    IEnumerator<T> underlying;

    public ClonableEnumerable(IEnumerable<T> underlying)
    {
        this.underlying = underlying.GetEnumerator();
    }

    public IEnumerator<T> GetEnumerator()
    {
        return new ClonableEnumerator<T>(this);
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

    private object GetPosition(int position)
    {
        if (HasPosition(position))
            return items[position];

        throw new IndexOutOfRangeException();
    }

    private bool HasPosition(int position)
    {
        lock (this)
        {
            while (items.Count <= position)
            {
                if (underlying.MoveNext())
                {
                    items.Add(underlying.Current);
                }
                else
                {
                    return false;
                }
            }
        }

        return true;
    }

    public IClonableEnumerable<T> Clone()
    {
        return this;
    }


    class ClonableEnumerator<T> : IEnumerator<T>
    {
        ClonableEnumerable<T> enumerable;
        int position = -1;

        public ClonableEnumerator(ClonableEnumerable<T> enumerable)
        {
            this.enumerable = enumerable;
        }

        public T Current
        {
            get
            {
                if (position < 0)
                    throw new Exception();
                return (T)enumerable.GetPosition(position);
            }
        }

        public void Dispose()
        {
        }

        object IEnumerator.Current
        {
            get { return this.Current; }
        }

        public bool MoveNext()
        {
            if(enumerable.HasPosition(position + 1))
            {
                position++;
                return true;
            }
            return false;
        }

        public void Reset()
        {
            position = -1;
        }
    }


}

interface IClonableEnumerable<T> : IEnumerable<T>
{
    IClonableEnumerable<T> Clone();
}

The purpose of "clonable" enumerators is mainly to be able save iteration position and be able to return to it later. That means, the iterated container must provide more rich interface than just IEnumerable. It is actually something between IEnumerable and IList. Working with IList you can just use integer index as enumerator, or create a simple immutable wrapping class, holding a reference to the list and current position.

If your container does not support random access and can only be iterated forward (like one-directional linked list), it must at least provide ability to get next element, having a reference to the previous one or to some "iteration state" that you can hold in your iterator. So, the interface can look like this:

interface IIterable<T>
{
    IIterator<T> GetIterator(); // returns an iterator positioned at start
    IIterator<T> GetNext(IIterator<T> prev); // returns an iterator positioned at the next element from the given one
}

interface IIterator<T>
{
    T Current { get; }
    IEnumerable<T> AllRest { get; }
}

Note that the iterator is immutable, it can not be "moved forward", we only can ask our iterable container to give us a new iterator pointing to the next position. The benefit of that is that you can store iterators anywhere as long as you need, for example have a stack of iterators and return to previously saved position when you need. You can save current position for later use by assigning to a variable, just as you would do with an integer index.

The AllRest property can be useful if you need to iterate from the given position to the end of container using standard language iteration features, like foraech or LinQ. It won't change the iterator position (remember, our iterator is immutable). The implementation can repeatedly GetNext and yleid return.

The GetNext method can actually be a part of iterator itself, like that:

interface IIterable<T>
{
    IIterator<T> GetIterator(); // returns an iterator positioned at start
}

interface IIterator<T>
{
    T Current { get; }
    IIterator<T> GetNext { get; } // returns an iterator positioned at the next element from the given one
    IEnumerable<T> AllRest { get; }
}

This is pretty much the same. The logic of determining the next state is just moved from the container implementation to the iterator implementation. Note that the iterator is still immutable. You can not "move it forward", you only can get another one, pointing to the next element.

There already is a way to create a new enumerator -- the same way you created the first one: IEnumerable.GetEnumerator. I'm not sure why you need another mechanism to do the same thing.

And in the spirit of the DRY principle, I'm curious as to why you would want the responsibility for creating new IEnumerator instances to be duplicated in both your enumerable and your enumerator classes. You would be forcing the enumerator to maintain additional state beyond what's required.

For example, imagine an enumerator for a linked list. For the basic implementation of IEnumerable, that class would only need to keep a reference to the current node. But to support your Clone, it would also need to keep a reference to the head of the list -- something it otherwise has no use for*. Why would you add that extra state to the enumerator, when you can just go to the source (the IEnumerable) and get another enumerator?

And why would you double the number of code paths you need to test? Every time you make a new way to manufacture an object, you're adding complexity.

* You would also need the head pointer if you implemented Reset, but according to the docs, Reset is only there for COM interop, and you're free to throw a NotSupportedException.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow