質問

Today, I faced a problem with performance while iterating through a list of items. After done some diagnostic, I finally figured out the reason which slowed down performance. It turned out that iterating through an IEnumerable<T> took much more time than iterating through a List<T>. Please help me understand why IEnumerable<T> is slower than List<T>.

UPDATE benchmark context:

I'm using NHibernate to fetch a collection of items from a database into an IEnumerable<T> and sum its property's value. This is just a simple entity without any reference type:

public SimpleEntity
{
    public int Id {get;set}
    public string Name {get;set}
    public decimal Price {get;set}
}

Public Test
{
    void Main()
    {
        //this query get a list of about 200 items
        IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
                                             select entity;

        decimal value = 0.0;
        foreach(SimpleEntity item in entities)
        {
             //this for loop took 1.5 seconds 
             value += item.Price;
        }

        List<SimpleEntity> lstEntities = entities.ToList();

        foreach(SimpleEntity item in lstEntities)
        {
             //this for loop took less than a milisecond
             value += item.Price;
        }
    }
}
役に立ちましたか?

解決 2

List<T> is an IEnumerable<T>. When you are iterating through your List<T>, you are performing the same sequence of operations as you are for any other IEnumerable<T>:

  • Get an IEnumerator<T>.
  • Invoke IEnumerator<T>.MoveNext() on your enumerator.
  • Take the IEnumerator<T>.Current element from the IEnumerator interface while MoveNext() returns true.
  • Dispose of the IEnumerator<T>.

What we know about List<T> is that it is an in-memory collection, so the MoveNext() function on its enumerator is going to be very cheap. It looks like your collection gives an enumerator whose MoveNext() method is more expensive, perhaps because it is interacting with some external resource such as a database connection.

When you call ToList() on your IEnumerable<T>, you are running a full iteration of your collection and loading all of the elements into memory with that iteration. This is worth doing if you expect to be iterating through the same collection multiple times. If you expect to iterate through the collection only once, then ToList() is a false economy: all it does is to create an in-memory collection that will later have to be garbage collected.

他のヒント

Enumerating an IEnumerable<T> is 2 to 3 times slower than enumerating the same List<T> directly. This is due to a subtlety on how C# selects its enumerator for a given type.

List<T> exposes 3 enumerators:

  1. List<T>.Enumerator List<T>.GetEnumerator()
  2. IEnumerator<T> IEnumerable<T>.GetEnumerator()
  3. IEnumerator IEnumerable.GetEnumerator()

When C# compiles a foreach loop, it will select the enumerator in the above order. Note that a type doesn't need to implement IEnumerable or IEnumerable<T> to be enumerable, it just needs a method named GetEnumerator() that returns an enumerator.

Now, List<T>.GetEnumerator() has the advantage of being statically typed which makes all calls to List<T>.Enumerator.get_Current and List<T>.Enumerator.MoveNext() static-bound instead of virtual.

10M iterations (coreclr):

for(int i ...)               73 ms
foreach(... List<T>)        215 ms
foreach(... IEnumerable<T>) 698 ms
foreach(... IEnumerable)   1028 ms
for(int *p ...)              50 ms

10M iterations (Framework):

for(int i ...)              210 ms
foreach(... List<T>)        252 ms
foreach(... IEnumerable<T>) 537 ms
foreach(... IEnumerable)    844 ms
for(int *p ...)             202 ms

Disclaimer

I should point out the actual iteration in a list is rarely the bottleneck. Keep in mind those are hundreds of milliseconds over millions of iterations. Any work in the loop more complicated than a few arithmetic operations will be overwhelmingly costlier than the iteration itself.

List<T> is an implementation of IEnumerable<T> interface. To use the foreach syntax, you don't need a List<T> type or a IEnumerable<T> type, but you are required to use a type with a GetEnumerator() method. Quote from Microsoft docs:

The foreach statement isn't limited to those types. You can use it with an >instance of any type that satisfies the following conditions:

  • A type has the public parameterless GetEnumerator method whose return type is either class, struct, or interface type. Beginning with C# 9.0, the GetEnumerator method can be a type's extension method.
  • The return type of the GetEnumerator method has the public Current property and the public parameterless MoveNext method whose return type is Boolean.

Considering for example a LINQ context, performing a query, using an IEnumerable structure you have the advantange of a deferred execution of the query (the query will be executed only when needed), but, using the ToList() method, you're requesting that the query must be executed (or evaluated) immediately and you want your results in memory, saving them in a list, to perform later some operations on them, like changing some values.

About the performance, it depends on what you're trying to do. We don't know which operations you're performing (like fetching data from a database), which collection types you're using and so on.

UPDATE

The reason why you have a different timing between the IEnumerable collection iteration and the List collection iteration, is, like I said, that you have a deferred execution of the query when you're invoking:

IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
                                             select entity;

That means the query is executed only when you're iterating over the IEnumerable collection. This doesn't happen when you're calling the ToList() method in entities.ToList(); for the reasons I described above.

I believe it has nothing to do with IEnumerable. It's because on the first loop, when you are iterating over the IEnumerable, you are actually executing the query.

Which is completely different from the second case, when you would be executing the query here:

List<SimpleEntity> lstEntities = entities.ToList();

Making the iteration much faster because you are not actually querying the BD and transforming the result to a list while you are in the loop.

If you instead do this:

foreach(SimpleEntity item in entities.ToList())
{
    //this for loop took less than a milisecond
    value += item.Price;
}

Perhaps you would get a similar performance.

You are using linq.

IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
                                         select entity;

Justs declare the query. It will be executed when foreach gets the enumerator. The 1.5 seconds include the excution of Session.Query<>.

If you measure the line

List<SimpleEntity> lstEntities = entities.ToList();

You should get the 1.5 seconds or at least more than 1 second.

Are you sure your measures are being taken correctly? You should mesaure the second loop including entites.ToList().

Cheers!

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top