Question

Maybe a basic question but let us say I have a string that is 2000 characters long, I need to split this string into max 512 character chunks each.

Is there a nice way, like a loop or so for doing this?

Was it helpful?

Solution

Something like this:

private IList<string> SplitIntoChunks(string text, int chunkSize)
{
    List<string> chunks = new List<string>();
    int offset = 0;
    while (offset < text.Length)
    {
        int size = Math.Min(chunkSize, text.Length - offset);
        chunks.Add(text.Substring(offset, size));
        offset += size;
    }
    return chunks;
}

Or just to iterate over:

private IEnumerable<string> SplitIntoChunks(string text, int chunkSize)
{
    int offset = 0;
    while (offset < text.Length)
    {
        int size = Math.Min(chunkSize, text.Length - offset);
        yield return text.Substring(offset, size);
        offset += size;
    }
}

Note that this splits into chunks of UTF-16 code units, which isn't quite the same as splitting into chunks of Unicode code points, which in turn may not be the same as splitting into chunks of glyphs.

OTHER TIPS

Though this question meanwhile has an accepted answer, here's a short version with the help of regular expressions. Purists may not like it (understandably) but when you need a quick solution and you are handy with regexes, this can be it. Performance is rather good, surprisingly:

string [] split = Regex.Split(yourString, @"(?<=\G.{512})");

What it does? Negative look-backward and remembering the last position with \G. It will also catch the last bit, even if it isn't dividable by 512.

using Jon's implementation and the yield keyword.

IEnumerable<string> Chunks(string text, int chunkSize)
{
    for (int offset = 0; offset < text.Length; offset += chunkSize)
    {
        int size = Math.Min(chunkSize, text.Length - offset);
        yield return text.Substring(offset, size);
    }
}
static IEnumerable<string> Split(string str, int chunkSize)    
{   
    int len = str.Length;
    return Enumerable.Range(0, len / chunkSize).Select(i => str.Substring(i * chunkSize, chunkSize));    
}

source: Splitting a string into chunks of a certain size

I will dare to provide a more LINQified version of Jon's solution, based on the fact that the string type implements IEnumerable<char>:

private IList<string> SplitIntoChunks(string text, int chunkSize)
{
    var chunks = new List<string>();
    int offset = 0;
    while(offset < text.Length) {
        chunks.Add(new string(text.Skip(offset).Take(chunkSize).ToArray()));
        offset += chunkSize;
    }
    return chunks;
}

Most of the answer may have the same flaw. Given an empty text they will yield nothing. We (I) expect at least to get back that empty string (same behaviour as a split on a char not in the string, which will give back one item : that given string)

so we should loop at least once all times (based on Jon's code) :

IEnumerable<string> SplitIntoChunks (string text, int chunkSize)
{
    int offset = 0;
    do
    {
        int size = Math.Min (chunkSize, text.Length - offset);
        yield return text.Substring (offset, size);
        offset += size;
    } while (offset < text.Length);
}

or using a for (Edited : after toying a little more with this, I found a better way to handle the case chunkSize greater than text) :

IEnumerable<string> SplitIntoChunks (string text, int chunkSize)
{
    if (text.Length <= chunkSize)
        yield return text;
    else
    {
        var chunkCount = text.Length / chunkSize;
        var remainingSize = text.Length % chunkSize;

        for (var offset = 0; offset < chunkCount; ++offset)
            yield return text.Substring (offset * chunkSize, chunkSize);

        // yield remaining text if any
        if (remainingSize != 0)
            yield return text.Substring (chunkCount * chunkSize, remainingSize);
    }
}

That could also be used with the do/while loop ;)

Generic extension method:

using System;
using System.Collections.Generic;
using System.Linq;

public static class IEnumerableExtensions
{
  public static IEnumerable<IEnumerable<T>> SplitToChunks<T> (this IEnumerable<T> coll, int chunkSize)
  {
    int skipCount = 0;
    while (coll.Skip (skipCount).Take (chunkSize) is IEnumerable<T> part && part.Any ())
    {
      skipCount += chunkSize;
      yield return part;
    }
  }
}

class Program
{
  static void Main (string[] args)
  {
    var col = Enumerable.Range(1,1<<10);
    var chunks = col.SplitToChunks(8);

    foreach (var c in chunks.Take (200))
    {
      Console.WriteLine (string.Join (" ", c.Select (n => n.ToString ("X4"))));
    }

    Console.WriteLine ();
    Console.WriteLine ();

    "Split this text into parts that are fifteen characters in length, surrounding each part with single quotes and output each into the console on seperate lines."
      .SplitToChunks (15)
      .Select(p => $"'{string.Concat(p)}'")
      .ToList ()
      .ForEach (p => Console.WriteLine (p));

    Console.ReadLine ();
  }
}

Something like?

Calculate eachLength = StringLength / WantedCharLength
Then for (int i = 0; i < StringLength; i += eachLength)
SubString (i, eachLength);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top