Question

Lets assume you have a function that returns a lazily-enumerated object:

struct AnimalCount
{
    int Chickens;
    int Goats;
}

IEnumerable<AnimalCount> FarmsInEachPen()
{
    ....
    yield new AnimalCount(x, y);
    ....
}

You also have two functions that consume two separate IEnumerables, for example:

ConsumeChicken(IEnumerable<int>);
ConsumeGoat(IEnumerable<int>);

How can you call ConsumeChicken and ConsumeGoat without a) converting FarmsInEachPen() ToList() beforehand because it might have two zillion records, b) no multi-threading.

Basically:

ConsumeChicken(FarmsInEachPen().Select(x => x.Chickens));
ConsumeGoats(FarmsInEachPen().Select(x => x.Goats));

But without forcing the double enumeration.

I can solve it with multithread, but it gets unnecessarily complicated with a buffer queue for each list.

So I'm looking for a way to split the AnimalCount enumerator into two int enumerators without fully evaluating AnimalCount. There is no problem running ConsumeGoat and ConsumeChicken together in lock-step.

I can feel the solution just out of my grasp but I'm not quite there. I'm thinking along the lines of a helper function that returns an IEnumerable being fed into ConsumeChicken and each time the iterator is used, it internally calls ConsumeGoat, thus executing the two functions in lock-step. Except, of course, I don't want to call ConsumeGoat more than once..

Was it helpful?

Solution 4

I figured it out, thanks in large part due to the path that @Lee put me on.

You need to share a single enumerator between the two zips, and use an adapter function to project the correct element into the sequence.

private static IEnumerable<object> ConsumeChickens(IEnumerable<int> xList)
{
    foreach (var x in xList)
    {
        Console.WriteLine("X: " + x);
        yield return null;
    }
}

private static IEnumerable<object> ConsumeGoats(IEnumerable<int> yList)
{
    foreach (var y in yList)
    {
        Console.WriteLine("Y: " + y);
        yield return null;
    }
}

private static IEnumerable<int> SelectHelper(IEnumerator<AnimalCount> enumerator, int i)
{
    bool c = i != 0 || enumerator.MoveNext();
    while (c)
    {
        if (i == 0)
        {
            yield return enumerator.Current.Chickens;
            c = enumerator.MoveNext();
        }
        else
        {
            yield return enumerator.Current.Goats;
        }
    }
}

private static void Main(string[] args)
{
    var enumerator = GetAnimals().GetEnumerator();

    var chickensList = ConsumeChickens(SelectHelper(enumerator, 0));
    var goatsList = ConsumeGoats(SelectHelper(enumerator, 1));

    var temp = chickensList.Zip(goatsList, (i, i1) => (object) null);
    temp.ToList();

    Console.WriteLine("Total iterations: " + iterations);
}

OTHER TIPS

I don't think there is a way to do what you want, since ConsumeChickens(IEnumerable<int>) and ConsumeGoats(IEnumerable<int>) are being called sequentially, each of them enumerating a list separately - how do you expect that to work without two separate enumerations of the list?

Depending on the situation, a better solution is to have ConsumeChicken(int) and ConsumeGoat(int) methods (which each consume a single item), and call them in alternation. Like this:

foreach(var animal in animals)
{
    ConsomeChicken(animal.Chickens);
    ConsomeGoat(animal.Goats);
}

This will enumerate the animals collection only once.


Also, a note: depending on your LINQ-provider and what exactly it is you're trying to do, there may be better options. For example, if you're trying to get the total sum of both chickens and goats from a database using linq-to-sql or linq-to-entities, the following query..

from a in animals
group a by 0 into g
select new 
{
    TotalChickens = g.Sum(x => x.Chickens), 
    TotalGoats = g.Sum(x => x.Goats)
}

will result in a single query, and do the summation on the database-end, which is greatly preferable to pulling the entire table over and doing the summation on the client end.

The way you have posed your problem, there is no way to do this. IEnumerable<T> is a pull enumerable - that is, you can GetEnumerator to the front of the sequence and then repeatedly ask "Give me the next item" (MoveNext/Current). You can't, on one thread, have two different things pulling from the animals.Select(a => a.Chickens) and animals.Select(a => a.Goats) at the same time. You would have to do one then the other (which would require materializing the second).

The suggestion BlueRaja made is one way to change the problem slightly. I would suggest going that route.

The other alternative is to utilize IObservable<T> from Microsoft's reactive extensions (Rx), a push enumerable. I won't go into the details of how you would do that, but it's something you could look into.

Edit:

The above is assuming that ConsumeChickens and ConsumeGoats are both returning void or are at least not returning IEnumerable<T> themselves - which seems like an obvious assumption. I'd appreciate it if the lame downvoter would actually comment.

Actually simples way to achieve what you what is convert FarmsInEachPen return value to push collection or IObservable and use ReactiveExtensions for working with it

var observable = new Subject<Animals>()
observable.Do(x=> DoSomethingWithChicken(x. Chickens))
observable.Do(x=> DoSomethingWithGoat(x.Goats))

foreach(var item in FarmsInEachPen())
{
    observable.OnNext(item)
}   
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top