Question

In looking at System.Linq.Enumerable through Reflector i noticed that default iterator used for Select and Where extension methods - WhereSelectArrayIterator - does not implement ICollection interface. If i read code properly this causes some other extension methods, such as Count() and ToList() perform slower:

public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
    // code above snipped
    if (source is List<TSource>)
    {
        return new WhereSelectListIterator<TSource, TResult>((List<TSource>) source, null, selector);
    }
    // code below snipped
}

private class WhereSelectListIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
    // Fields
    private List<TSource> source; // class has access to List source so can implement ICollection
    // code below snipped
}


public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
{
public List(IEnumerable<T> collection)
{
    ICollection<T> is2 = collection as ICollection<T>;
    if (is2 != null)
    {
        int count = is2.Count;
        this._items = new T[count];
        is2.CopyTo(this._items, 0); // FAST
        this._size = count;
    }
    else
    {
        this._size = 0;
        this._items = new T[4];
        using (IEnumerator<T> enumerator = collection.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                this.Add(enumerator.Current);  // SLOW, CAUSES ARRAY EXPANSION
            }
        }
    }
}

}

I've tested this with results confirming my suspicion:

ICollection: 2388.5222 ms

IEnumerable: 3308.3382 ms

Here's the test code:

    // prepare source
    var n = 10000;
    var source = new List<int>(n);
    for (int i = 0; i < n; i++) source.Add(i);

    // Test List creation using ICollection
    var startTime = DateTime.Now;
    for (int i = 0; i < n; i++)
    {
        foreach(int l in source.Select(k => k)); // itterate to make comparison fair
        new List<int>(source);
    }
    var finishTime = DateTime.Now;
    Response.Write("ICollection: " + (finishTime - startTime).TotalMilliseconds + " ms <br />");

    // Test List creation using IEnumerable
    startTime = DateTime.Now;
    for (int i = 0; i < n; i++) new List<int>(source.Select(k => k));
    finishTime = DateTime.Now;
    Response.Write("IEnumerable: " + (finishTime - startTime).TotalMilliseconds + " ms");

Am i missing something or will this be fixed in future versions of framework?

Thank you for your thoughts.

Was it helpful?

Solution

LINQ to Objects uses some tricks to optimize certain operations. For example, if you chain two .Where statements together, the predicates will be combined into a single WhereArrayIterator, so the previous ones can be garbage collected. Likewise, a Where followed by a Select will create a WhereSelectArrayIterator, passing the combined predicates as an argument so that the original WhereArrayiterator can be garbage collected. So the WhereSelectArrayIterator is responsible for tracking not only the selector, but also the combined predicate that it may or may not be based on.

The source field only keeps track of the initial list that was given. Because of the predicate, the iteration result will not always have the same number of items as source does. Since LINQ is intended to be lazily-evaluated, it shouldn't evaluate the source against the predicate ahead of time just so that it can potentially save time if someone ends up calling .Count(). That would cause just as much of a performance hit as calling .ToList() on it manually, and if the user ran it through multiple Where and Select clauses, you'd end up constructing multiple lists unnecessarily.

Could LINQ to Objects be refactored to create a SelectArrayIterator that it uses when Select gets called directly on an array? Sure. Would it enhance performance? A little bit. At what cost? Less code reuse means additional code to maintain and test moving forward.

And thus we get to the crux of the vast majority of "Why doesn't language/platform X have feature Y" questions: every feature and optimization has some cost associated with it, and even Microsoft doesn't have unlimited resources. Just like every other company out there, they make judgment calls to determine how often code will be run that performs a Select on an array and then calls .ToList() on it, and whether making that run a little faster is worth writing and maintaining another class in the LINQ package.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top