Proper use of 'yield return'

https://stackoverflow.com/questions/410026

03-07-2019
|

Question

The yield keyword is one of those keywords in C# that continues to mystify me, and I've never been confident that I'm using it correctly.

Of the following two pieces of code, which is the preferred and why?

Version 1: Using yield return

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        foreach (Product product in products)
        {
            yield return product;
        }
    }
}

Version 2: Return the list

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        return products.ToList<Product>();
    }
}

Solution

I tend to use yield-return when I calculate the next item in the list (or even the next group of items).

Using your Version 2, you must have the complete list before returning. By using yield-return, you really only need to have the next item before returning.

Among other things, this helps spread the computational cost of complex calculations over a larger time-frame. For example, if the list is hooked up to a GUI and the user never goes to the last page, you never calculate the final items in the list.

Another case where yield-return is preferable is if the IEnumerable represents an infinite set. Consider the list of Prime Numbers, or an infinite list of random numbers. You can never return the full IEnumerable at once, so you use yield-return to return the list incrementally.

In your particular example, you have the full list of products, so I'd use Version 2.

OTHER TIPS

Populating a temporary list is like downloading the whole video, whereas using yield is like streaming that video.

As a conceptual example for understanding when you ought to use yield, let's say the method ConsumeLoop() processes the items returned/yielded by ProduceList():

void ConsumeLoop() {
    foreach (Consumable item in ProduceList())        // might have to wait here
        item.Consume();
}

IEnumerable<Consumable> ProduceList() {
    while (KeepProducing())
        yield return ProduceExpensiveConsumable();    // expensive
}

Without yield, the call to ProduceList() might take a long time because you have to complete the list before returning:

//pseudo-assembly
Produce consumable[0]                   // expensive operation, e.g. disk I/O
Produce consumable[1]                   // waiting...
Produce consumable[2]                   // waiting...
Produce consumable[3]                   // completed the consumable list
Consume consumable[0]                   // start consuming
Consume consumable[1]
Consume consumable[2]
Consume consumable[3]

Using yield, it becomes rearranged, sort of working "in parallel":

//pseudo-assembly
Produce consumable[0]
Consume consumable[0]                   // immediately Consume
Produce consumable[1]
Consume consumable[1]                   // consume next
Produce consumable[2]
Consume consumable[2]                   // consume next
Produce consumable[3]
Consume consumable[3]                   // consume next

And lastly, as many before have already suggested, you should use Version 2 because you already have the completed list anyway.

This is going to seem like a bizarre suggestion, but I learned how to use the yield keyword in C# by reading a presentation on generators in Python: David M. Beazley's http://www.dabeaz.com/generators/Generators.pdf. You don't need to know much Python to understand the presentation - I didn't. I found it very helpful in explaining not just how generators work but why you should care.

I know this is an old question, but I'd like to offer one example of how the yield keyword can be creatively used. I have really benefited from this technique. Hopefully this will be of assistance to anyone else who stumbles upon this question.

Note: Don't think about the yield keyword as merely being another way to build a collection. A big part of the power of yield comes in the fact that execution is paused in your method or property until the calling code iterates over the next value. Here's my example:

Using the yield keyword (alongside Rob Eisenburg's Caliburn.Micro coroutines implementation) allows me to express an asynchronous call to a web service like this:

public IEnumerable<IResult> HandleButtonClick() {
    yield return Show.Busy();

    var loginCall = new LoginResult(wsClient, Username, Password);
    yield return loginCall;
    this.IsLoggedIn = loginCall.Success;

    yield return Show.NotBusy();
}

What this will do is turn my BusyIndicator on, call the Login method on my web service, set my IsLoggedIn flag to the return value, and then turn the BusyIndicator back off.

Here's how this works: IResult has an Execute method and a Completed event. Caliburn.Micro grabs the IEnumerator from the call to HandleButtonClick() and passes it into a Coroutine.BeginExecute method. The BeginExecute method starts iterating through the IResults. When the first IResult is returned, execution is paused inside HandleButtonClick(), and BeginExecute() attaches an event handler to the Completed event and calls Execute(). IResult.Execute() can perform either a synchronous or an asynchronous task and fires the Completed event when it's done.

LoginResult looks something like this:

public LoginResult : IResult {
    // Constructor to set private members...

    public void Execute(ActionExecutionContext context) {
        wsClient.LoginCompleted += (sender, e) => {
            this.Success = e.Result;
            Completed(this, new ResultCompletionEventArgs());
        };
        wsClient.Login(username, password);
    }

    public event EventHandler<ResultCompletionEventArgs> Completed = delegate { };
    public bool Success { get; private set; }
}

It may help to set up something like this and step through the execution to watch what's going on.

Hope this helps someone out! I've really enjoyed exploring the different ways yield can be used.

Yield return can be very powerful for algorithms where you need to iterate through millions of objects. Consider the following example where you need to calculate possible trips for rideshare. First we generate possible trips:

    static IEnumerable<Trip> CreatePossibleTrips()
    {
        for (int i = 0; i < 1000000; i++)
        {
            yield return new Trip
            {
                Id = i.ToString(),
                Driver = new Driver { Id = i.ToString() }
            };
        }
    }

Then iterate through each trip:

    static void Main(string[] args)
    {
        foreach (var trip in CreatePossibleTrips(trips))
        {
            // possible trip is actually calculated only at this point, because of yield
            if (IsTripGood(trip))
            {
                // match good trip
            }
        }
    }

If you use List instead of yield, you will need to allocation 1 million objects to memory (~190mb) and this simple example will take ~1400ms to run. However, if you use yield, you don't need to put all these temp objects to memory and you will get significantly faster algorithm speed: this example will take only ~400ms to run with no memory consumption at all.

The two pieces of code are really doing two different things. The first version will pull members as you need them. The second version will load all the results into memory before you start to do anything with it.

There's no right or wrong answer to this one. Which one is preferable just depends on the situation. For example, if there's a limit of time that you have to complete your query and you need to do something semi-complicated with the results, the second version could be preferable. But beware large resultsets, especially if you're running this code in 32-bit mode. I've been bitten by OutOfMemory exceptions several times when doing this method.

The key thing to keep in mind is this though: the differences are in efficiency. Thus, you should probably go with whichever one makes your code simpler and change it only after profiling.

Yield has two great uses

It helps to provide custom iteration with out creating temp collections. ( loading all data and looping)

It helps to do stateful iteration. ( streaming)

Below is a simple video which i have created with full demonstration in order to support the above two points

http://www.youtube.com/watch?v=4fju3xcm21M

This is what Chris Sells tells about those statements in The C# Programming Language;

I sometimes forget that yield return is not the same as return , in that the code after a yield return can be executed. For example, the code after the first return here can never be executed:
    int F() {
return 1;
return 2; // Can never be executed
}
In contrast, the code after the first yield return here can be executed:
IEnumerable<int> F() {
yield return 1;
yield return 2; // Can be executed
}
This often bites me in an if statement:
IEnumerable<int> F() {
if(...) { yield return 1; } // I mean this to be the only
// thing returned
yield return 2; // Oops!
}
In these cases, remembering that yield return is not “final” like return is helpful.

Assuming your products LINQ class uses a similar yield for enumerating/iterating, the first version is more efficient because its only yielding one value each time its iterated over.

The second example is converting the enumerator/iterator to a list with the ToList() method. This means it manually iterates over all the items in the enumerator and then returns a flat list.

This is kinda besides the point, but since the question is tagged best-practices I'll go ahead and throw in my two cents. For this type of thing I greatly prefer to make it into a property:

public static IEnumerable<Product> AllProducts
{
    get {
        using (AdventureWorksEntities db = new AdventureWorksEntities()) {
            var products = from product in db.Product
                           select product;

            return products;
        }
    }
}

Sure, it's a little more boiler-plate, but the code that uses this will look much cleaner:

prices = Whatever.AllProducts.Select (product => product.price);

prices = Whatever.GetAllProducts().Select (product => product.price);

Note: I wouldn't do this for any methods that may take a while to do their work.

And what about this?

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        return products.ToList();
    }
}

I guess this is much cleaner. I do not have VS2008 at hand to check, though. In any case, if Products implements IEnumerable (as it seems to - it is used in a foreach statement), I'd return it directly.

I would have used version 2 of the code in this case. Since you have the full-list of products available and that's what expected by the "consumer" of this method call, it would be required to send the complete information back to the caller.

If caller of this method requires "one" information at a time and the consumption of the next information is on-demand basis, then it would be beneficial to use yield return which will make sure the command of execution will be returned to the caller when a unit of information is available.

Some examples where one could use yield return is:

Complex, step-by-step calculation where caller is waiting for data of a step at a time
Paging in GUI - where user might never reach to the last page and only sub-set of information is required to be disclosed on current page

To answer your questions, I would have used the version 2.

Return the list directly. Benefits:

It's more clear
~~The list is reusable. (the iterator is not)~~ not actually true, Thanks Jon

You should use the iterator (yield) from when you think you probably won't have to iterate all the way to the end of the list, or when it has no end. For example, the client calling is going to be searching for the first product that satisfies some predicate, you might consider using the iterator, although that's a contrived example, and there are probably better ways to accomplish it. Basically, if you know in advance that the whole list will need to be calculated, just do it up front. If you think that it won't, then consider using the iterator version.

The yield return keyphrase is used to maintain the state machine for a particular collection. Wherever the CLR sees the yield return keyphrase being used, CLR implements an Enumerator pattern to that piece of code. This type of implementation helps the developer from all the type of plumbing which we would have otherwise have to do in absence of the keyword.

Suppose if the developer is filtering some collection, iterating though the collection and then extracting those objects in some new collection. This kind of plumbing is quite monotonous.

More about the keyword here at this article.

The usage of yield is similar to the keyword return, except that it will return a generator. And the generator object will only traverse once.

yield has two benefits:

You do not need to read these values twice;
You can get many child nodes but do not have to put them all in memory.

There is another clear explanation maybe help you.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow