Is yield useful outside of LINQ?

https://stackoverflow.com/questions/317619

11-07-2019
|

Question

When ever I think I can use the yield keyword, I take a step back and look at how it will impact my project. I always end up returning a collection instead of yeilding because I feel the overhead of maintaining the state of the yeilding method doesn't buy me much. In almost all cases where I am returning a collection I feel that 90% of the time, the calling method will be iterating over all elements in the collection, or will be seeking a series of elements throughout the entire collection.

I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.

Has anyone written anything like or not like linq where yield was useful?

Solution

I recently had to make a representation of mathematical expressions in the form of an Expression class. When evaluating the expression I have to traverse the tree structure with a post-order treewalk. To achieve this I implemented IEnumerable<T> like this:

public IEnumerator<Expression<T>> GetEnumerator()
{
    if (IsLeaf)
    {
        yield return this;
    }
    else
    {
        foreach (Expression<T> expr in LeftExpression)
        {
            yield return expr;
        }
        foreach (Expression<T> expr in RightExpression)
        {
            yield return expr;
        }
        yield return this;
    }
}

Then I can simply use a foreach to traverse the expression. You can also add a Property to change the traversal algorithm as needed.

OTHER TIPS

Note that with yield, you are iterating over the collection once, but when you build a list, you'll be iterating over it twice.

Take, for example, a filter iterator:

IEnumerator<T>  Filter(this IEnumerator<T> coll, Func<T, bool> func)
{
     foreach(T t in coll)
        if (func(t))  yield return t;
}

Now, you can chain this:

 MyColl.Filter(x=> x.id > 100).Filter(x => x.val < 200).Filter (etc)

You method would be creating (and tossing) three lists. My method iterates over it just once.

Also, when you return a collection, you are forcing a particular implementation on you users. An iterator is more generic.

I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.

Yield was useful as soon as it got implemented in .NET 2.0, which was long before anyone ever thought of LINQ.

Why would I write this function:

IList<string> LoadStuff() {
  var ret = new List<string>();
  foreach(var x in SomeExternalResource)
    ret.Add(x);
  return ret;
}

When I can use yield, and save the effort and complexity of creating a temporary list for no good reason:

IEnumerable<string> LoadStuff() {
  foreach(var x in SomeExternalResource)
    yield return x;
}

It can also have huge performance advantages. If your code only happens to use the first 5 elements of the collection, then using yield will often avoid the effort of loading anything past that point. If you build a collection then return it, you waste a ton of time and space loading things you'll never need.

I could go on and on....

At a previous company, I found myself writing loops like this:

for (DateTime date = schedule.StartDate; date <= schedule.EndDate; 
     date = date.AddDays(1))

With a very simple iterator block, I was able to change this to:

foreach (DateTime date in schedule.DateRange)

It made the code a lot easier to read, IMO.

yield was developed for C#2 (before Linq in C#3).

We used it heavily in a large enterprise C#2 web application when dealing with data access and heavily repeated calculations.

Collections are great any time you have a few elements that you're going to hit multiple times.

However in lots of data access scenarios you have large numbers of elements that you don't necessarily need to pass round in a great big collection.

This is essentially what the SqlDataReader does - it's a forward only custom enumerator.

What yield lets you do is quickly and with minimal code write your own custom enumerators.

Everything yield does could be done in C#1 - it just took reams of code to do it.

Linq really maximises the value of the yield behaviour, but it certainly isn't the only application.

Whenever your function returns IEnumerable you should use "yielding". Not in .Net > 3.0 only.

.Net 2.0 example:

  public static class FuncUtils
  {
      public delegate T Func<T>();
      public delegate T Func<A0, T>(A0 arg0);
      public delegate T Func<A0, A1, T>(A0 arg0, A1 arg1);
      ... 

      public static IEnumerable<T> Filter<T>(IEnumerable<T> e, Func<T, bool> filterFunc)
      {
          foreach (T el in e)
              if (filterFunc(el)) 
                  yield return el;
      }


      public static IEnumerable<R> Map<T, R>(IEnumerable<T> e, Func<T, R> mapFunc)
      {
          foreach (T el in e) 
              yield return mapFunc(el);
      }
        ...

I'm not sure about C#'s implementation of yield(), but on dynamic languages, it's far more efficient than creating the whole collection. on many cases, it makes it easy to work with datasets much bigger than RAM.

I am a huge Yield fan in C#. This is especially true in large homegrown frameworks where often methods or properties return List that is a sub-set of another IEnumerable. The benefits that I see are:

the return value of a method that uses yield is immutable
you are only iterating over the list once
it a late or lazy execution variable, meaning the code to return the values are not executed until needed (though this can bite you if you dont know what your doing)
of the source list changes, you dont have to call to get another IEnumerable, you just iterate over IEnumeable again
many more

One other HUGE benefit of yield is when your method potentially will return millions of values. So many that there is the potential of running out of memory just building the List before the method can even return it. With yield, the method can just create and return millions of values, and as long the caller also doesnt store every value. So its good for large scale data processing / aggregating operations

Personnally, I haven't found I'm using yield in my normal day-to-day programming. However, I've recently started playing with the Robotics Studio samples and found that yield is used extensively there, so I also see it being used in conjunction with the CCR (Concurrency and Coordination Runtime) where you have async and concurrency issues.

Anyway, still trying to get my head around it as well.

Yield is useful because it saves you space. Most optimizations in programming makes a trade off between space (disk, memory, networking) and processing. Yield as a programming construct allows you to iterate over a collection many times in sequence without needing a separate copy of the collection for each iteration.

consider this example:

static IEnumerable<Person> GetAllPeople()
{
    return new List<Person>()
    {
        new Person() { Name = "George", Surname = "Bush", City = "Washington" },
        new Person() { Name = "Abraham", Surname = "Lincoln", City = "Washington" },
        new Person() { Name = "Joe", Surname = "Average", City = "New York" }
    };
}

static IEnumerable<Person> GetPeopleFrom(this IEnumerable<Person> people,  string where)
{
    foreach (var person in people)
    {
        if (person.City == where) yield return person;
    }
    yield break;
}

static IEnumerable<Person> GetPeopleWithInitial(this IEnumerable<Person> people, string initial)
{
    foreach (var person in people)
    {
        if (person.Name.StartsWith(initial)) yield return person;
    }
    yield break;
}

static void Main(string[] args)
{
    var people = GetAllPeople();
    foreach (var p in people.GetPeopleFrom("Washington"))
    {
        // do something with washingtonites
    }

    foreach (var p in people.GetPeopleWithInitial("G"))
    {
        // do something with people with initial G
    }

    foreach (var p in people.GetPeopleWithInitial("P").GetPeopleFrom("New York"))
    {
        // etc
    }
}

(Obviously you are not required to use yield with extension methods, it just creates a powerful paradigm to think about data.)

As you can see, if you have a lot of these "filter" methods (but it can be any kind of method that does some work on a list of people) you can chain many of them together without requiring extra storage space for each step. This is one way of raising the programming language (C#) up to express your solutions better.

The first side-effect of yield is that it delays execution of the filtering logic until you actually require it. If you therefore create a variable of type IEnumerable<> (with yields) but never iterate through it, you never execute the logic or consume the space which is a powerful and free optimization.

The other side-effect is that yield operates on the lowest common collection interface (IEnumerable<>) which enables the creation of library-like code with wide applicability.

Note that yield allows you to do things in a "lazy" way. By lazy, I mean that the evaluation of the next element in the IEnumberable is not done until the element is actually requested. This allows you the power to do a couple of different things. One is that you could yield an infinitely long list without the need to actually make infinite calculations. Second, you could return an enumeration of function applications. The functions would only be applied when you iterate through the list.

I've used yeild in non-linq code things like this (assuming functions do not live in same class):

public IEnumerable<string> GetData()
{
    foreach(String name in _someInternalDataCollection)
    {
        yield return name;
    }
}

...

public void DoSomething()
{
    foreach(String value in GetData())
    {
        //... Do something with value that doesn't modify _someInternalDataCollection
    }
}

You have to be careful not to inadvertently modify the collection that your GetData() function is iterating over though, or it will throw an exception.

Yield is very useful in general. It's in ruby among other languages that support functional style programming, so its like it's tied to linq. It's more the other way around, that linq is functional in style, so it uses yield.

I had a problem where my program was using a lot of cpu in some background tasks. What I really wanted was to still be able to write functions like normal, so that I could easily read them (i.e. the whole threading vs. event based argument). And still be able to break the functions up if they took too much cpu. Yield is perfect for this. I wrote a blog post about this and the source is available for all to grok :)

The System.Linq IEnumerable extensions are great, but sometime you want more. For example, consider the following extension:

public static class CollectionSampling
{
    public static IEnumerable<T> Sample<T>(this IEnumerable<T> coll, int max)
    {
        var rand = new Random();
        using (var enumerator = coll.GetEnumerator());
        {
            while (enumerator.MoveNext())
            {
                yield return enumerator.Current; 
                int currentSample = rand.Next(max);
                for (int i = 1; i <= currentSample; i++)
                    enumerator.MoveNext();
            }
        }
    }    
}

Another interesting advantage of yielding is that the caller cannot cast the return value to the original collection type and modify your internal collection

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow