Creating arbitrary sized groups from a list of object with lambda/linq in C#

https://stackoverflow.com/questions/8186536

04-03-2021
|

Question

Is there a way with the pre-existing Linq functions to create arbitrarily sized groups from a list of item?

For instance:

[1,2,3,4,5,6,7]

When doing something like list.Group(3) would produce an IEnumberable of IEnumerables that look like the sequence below.

[[1,2,3],[4,5,6],[7]]

Solution

We've got this in MoreLINQ as Batch.

var batch = source.Batch(3);

As you can see from the code, it's not really trivial to implement efficiently with the "standard" LINQ operators, but it's clearly doable. Note that it involves buffering the input, as the resulting sequences need to be independent.

If you do want to do it with just the standard operators, a less efficient implementation would be:

// Assume "size" is the batch size
var query = source.Select((value, index) => new { value, index })
                  .GroupBy(pair => pair.index / size, pair => pair.value);

EDIT: Just to show why this is safer than John Fisher's answer, here's a short but complete program to show the difference:

using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
    public static void Main(String[] args)
    {
        int[] source = { 1, 2, 3, 4, 5, 6, 7 };

        var skeet = SkeetAnswer(source, 3);
        var fisher = FisherAnswer(source, 3);

        Console.WriteLine("Size of the first element of Skeet's solution:");
        Console.WriteLine(skeet.First().Count());
        Console.WriteLine(skeet.First().Count());
        Console.WriteLine(skeet.First().Count());

        Console.WriteLine("Size of the first element of Fisher's solution:");
        Console.WriteLine(fisher.First().Count());
        Console.WriteLine(fisher.First().Count());
        Console.WriteLine(fisher.First().Count());
    }

    static IEnumerable<IEnumerable<int>> SkeetAnswer(IEnumerable<int> source,
                                                     int size)
    {
        return source.Select((value, index) => new { value, index })
                     .GroupBy(pair => pair.index / size, pair => pair.value);
    }

    static IEnumerable<IEnumerable<int>> FisherAnswer(IEnumerable<int> source,
                                                      int size)
    {
        int index = 0;
        return source.GroupBy(x => (index++ / size));
    }
}

Results:

Size of the first element of Skeet's solution:
3
3
3
Size of the first element of Fisher's solution:
3
2
1

While you could call ToList() at the end, at that point you've lost the efficiency gains of the approach - basically John's approach avoids creating an instance of an anonymous type for each member. This could be mitigated by using a value type equivalent of Tuple<,>, so that no more objects would be created, just pairs of values wrapped in another value. There's still be the slight increase in time required to do the projection then grouping.

This is a good demonstration of why having side-effects in your LINQ queries (in this case the modification to the captured variable index) is a bad idea.

Another alternative would be to write an implementation of GroupBy which provides the index of each element for the key projection. That's what's so nice about LINQ - there are so many options!

OTHER TIPS

I don't think there are any built in methods to do this, but it's not too difficult to implement. The group method you are referring to does something more like a SQL group. What you are talking about is often called chunking.

You can refactor this code to a extention method and use it for List:

int i = 0;
var result = list
    .Select(p => new { Counter = i++, Item = p })
    .Select(p => new { Group = p.Counter / 3, Item = p.Item })
    .GroupBy(p => p.Group)
    .Select(p=>p.Select(q=>q.Item))
    .ToList();

This should do it.

//var items = new int[] { 1, 2, 3, 4, 5, 6, 7 };
int index = 0;
var grouped = items.GroupBy(x => (index++ / 3));

There is no need to bother with the extra select steps from other answers. It wastes memory and time just to create throwaway objects with an extra index value.

Edit:

As Jon Skeet mentioned, iterating over grouped twice could cause problems (when the items being grouped don't divide cleanly into the group size, which is 3 in this example)).

To mitigate that, you could use what he suggests, or you could call ToList() on the results. (Technically, you could also reset the index to zero each time you iterate over the group, but that's a nasty code smell.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow