Question

I really like the algorithm shown below for splitting a list into sublists of a fixed size. It might not be the most an efficient algorithm (edit: at all).

I'd like something that has a good balance of readability, elegance, and performance. The problem is, most algorithms I find in C# require the yield keyword, which is not available if you're using .NET 3.5 in Visual Studio 2010 ;)

public IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int size)
{
    if (source == null)
        throw new ArgumentNullException("list");

    if (size < 1)
        throw new ArgumentOutOfRangeException("size");

    int index = 1;
    IEnumerable<T> partition = source.Take(size).AsEnumerable();

    while (partition.Any())
    {
        yield return partition;
        partition = source.Skip(index++ * size).Take(size).AsEnumerable();
    }
}

I tried rewriting this in VB, but had to use a second list to collect results into, which ended up taking significantly more time than the implementation above.

I'm looking for another algorithm I could use in VB.NET, but most of the results run into the issue of having to basically load everything into the memory instead of the ability to dynamically produce the results a la generators in python. Not an huge issue, but not as ideal as it would be with yield return.

Is there a good, recommended algorithm for doing this in VB.NET? Would I have to create something implementing IEnumerator to generate the results on demand?

Was it helpful?

Solution 2

Since I'm using the partitions locally, I ended up chosing to invert the situation: instead of passing the partitions to the action using it, I can pass the action to the function doing the partitioning.

Public Sub DoForPartition(Of T)(source As IEnumerable(Of T), 
                                size As Integer, 
                                doThis As Action(Of IEnumerable(Of T)))

    Dim partition(size - 1) As T
    Dim count = 0

    For Each t in source
        partition(count) = t
        count += 1

        If count = size Then
            doThis(partition)
            count = 0
        End If
    Next

    If count > 0 Then
        Array.Resize(partition, count)
        doThis(partition)
    End If
End Sub

This function avoids looping through the source multiple times and the only memory overhead is the size of the partition (instead of the entire source like some of the other options). I didn't write this function myself, but adapted a similarly looking C# function from this answer.

This looks like a much better algorithm than the one in my question.

OTHER TIPS

This might work as a work around. Make the sub routine a Sub and pass the target list in. now you can add the sub lists to it directly without creating a whole intermediary object first.

Dim testlist As New List(Of List(Of Integer))
Partition(Of Integer)({1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0}, 4, testlist)

Public Sub Partition(Of T)(source As IEnumerable(Of T), size As Integer, ByRef input As List(Of List(Of T)))
    If source Is Nothing Then
        Throw New ArgumentNullException("list")
    End If
    If size < 1 Then
        Throw New ArgumentOutOfRangeException("size")
    End If
    For i = 0 To source.Count - 1 Step size
        input.Add(source.Skip(i).Take(size).ToList())
    Next
End Sub

Lacking the Yield, you can do the following. I'm assuming you can convert to VB

public IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int size)
{
    if (source == null)
        throw new ArgumentNullException("list");

    if (size < 1)
        throw new ArgumentOutOfRangeException("size");

    List<List<T>> partitions = new List<List<T>>();

    int index = 1;
    IEnumerable<T> partition = source.Take(size).AsEnumerable();

    while (partition.Any())
    {
        partitions.Add(partition);
        partition = source.Skip(index++ * size).Take(size).AsEnumerable();
    }

    return partitions;
}

Note that this isn't an exact translation. The original code returns one partition at a time. This code creates all of the partitions and then returns them all in a list. Not a problem if the source isn't huge.

That said, it will look like the same thing to your code. That is, if you have:

foreach (IEnumerable<Foo> part in Partition(FooList, 100))
{
    // whatever
}

both versions will do essentially the same thing. The real difference is that my translation above will work as though you had written:

foreach (IEnumerable<Foo> part in Partition(FooList, 100).ToList())

As somebody pointed out in comments, this isn't the most efficient way to do things because the Skip could end up having to iterate over items multiple times. Again, if the source list isn't huge, then that probably isn't a problem. And it's likely that Skip is optimized to do direct indexing for things that implement IList.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top