Question

I want to break a string up into lines of a specified maximum length, without splitting any words, if possible (if there is a word that exceeds the maximum line length, then it will have to be split).

As always, I am acutely aware that strings are immutable and that one should preferably use the StringBuilder class. I have seen examples where the string is split into words and the lines are then built up using the StringBuilder class, but the code below seems "neater" to me.

I mentioned "best" in the description and not "most efficient" as I am also interested in the "eloquence" of the code. The strings will never be huge, generally splitting into 2 or three lines, and it won't be happening for thousands of lines.

Is the following code really bad?

private static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
    stringToSplit = stringToSplit.Trim();
    var lines = new List<string>();

    while (stringToSplit.Length > 0)
    {
        if (stringToSplit.Length <= maximumLineLength)
        {
            lines.Add(stringToSplit);
            break;
        }

        var indexOfLastSpaceInLine = stringToSplit.Substring(0, maximumLineLength).LastIndexOf(' ');
        lines.Add(stringToSplit.Substring(0, indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine : maximumLineLength).Trim());
        stringToSplit = stringToSplit.Substring(indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine + 1 : maximumLineLength);
    }

    return lines.ToArray();
}
Was it helpful?

Solution 2

How about this as a solution:

IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
    var words = stringToSplit.Split(' ').Concat(new [] { "" });
    return
        words
            .Skip(1)
            .Aggregate(
                words.Take(1).ToList(),
                (a, w) =>
                {
                    var last = a.Last();
                    while (last.Length > maximumLineLength)
                    {
                        a[a.Count() - 1] = last.Substring(0, maximumLineLength);
                        last = last.Substring(maximumLineLength);
                        a.Add(last);
                    }
                    var test = last + " " + w;
                    if (test.Length > maximumLineLength)
                    {
                        a.Add(w);
                    }
                    else
                    {
                        a[a.Count() - 1] = test;
                    }
                    return a;
                });
}

I reworked this as prefer this:

IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
    var words = stringToSplit.Split(' ');
    var line = words.First();
    foreach (var word in words.Skip(1))
    {
        var test = $"{line} {word}";
        if (test.Length > maximumLineLength)
        {
            yield return line;
            line = word;
        }
        else
        {
            line = test;
        }
    }
    yield return line;
}

OTHER TIPS

Even when this post is 3 years old I wanted to give a better solution using Regex to accomplish the same:

If you want the string to be splitted and then use the text to be displayed you can use this:

public string SplitToLines(string stringToSplit, int maximumLineLength)
{
    return Regex.Replace(stringToSplit, @"(.{1," + maximumLineLength +@"})(?:\s|$)", "$1\n");
}

If on the other hand you need a collection you can use this:

public MatchCollection SplitToLines(string stringToSplit, int maximumLineLength)
{
    return Regex.Matches(stringToSplit, @"(.{1," + maximumLineLength +@"})(?:\s|$)");
}

NOTES

Remember to import regex (using System.Text.RegularExpressions;)

You can use string interpolation on the match:
$@"(.{{1,{maximumLineLength}}})(?:\s|$)"

The MatchCollection works almost like an Array

Matching example with explanation here

I don't think your solution is too bad. I do, however, think you should break up your ternary into an if else because you are testing the same condition twice. Your code might also have a bug. Based on your description, it seems you want lines <= maxLineLength, but your code counts the space after the last word and uses it in the <= comparison resulting in effectively < behavior for the trimmed string.

Here is my solution.

private static IEnumerable<string> SplitToLines(string stringToSplit, int maxLineLength)
    {
        string[] words = stringToSplit.Split(' ');
        StringBuilder line = new StringBuilder();
        foreach (string word in words)
        {
            if (word.Length + line.Length <= maxLineLength)
            {
                line.Append(word + " ");
            }
            else
            {
                if (line.Length > 0)
                {
                    yield return line.ToString().Trim();
                    line.Clear();
                }
                string overflow = word;
                while (overflow.Length > maxLineLength)
                {
                    yield return overflow.Substring(0, maxLineLength);
                    overflow = overflow.Substring(maxLineLength);
                }
                line.Append(overflow + " ");
            }
        }
        yield return line.ToString().Trim();
    }

It is a bit longer than your solution, but it should be more straightforward. It also uses a StringBuilder so it is much faster for large strings. I performed a benchmarking test for 20,000 words ranging from 1 to 11 characters each split into lines of 10 character width. My method completed in 14ms compared to 1373ms for your method.

Try this (untested)

    private static IEnumerable<string> SplitToLines(string value, int maximumLineLength)
    {
        var words = value.Split(' ');
        var line = new StringBuilder();

        foreach (var word in words)
        {
            if ((line.Length + word.Length) >= maximumLineLength)
            {
                yield return line.ToString();
                line = new StringBuilder();
            }

            line.AppendFormat("{0}{1}", (line.Length>0) ? " " : "", word);
        }

        yield return line.ToString();
    }
  • ~6x faster than the accepted answer
  • More than 1.5x faster than the Regex version in Release Mode (dependent on line length)
  • Optionally keep the space at the end of the line or not (the regex version always keeps it)
    static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength, bool removeSpace = true)
        {
            int start = 0;
            int end = 0;
            for (int i = 0; i < stringToSplit.Length; i++)
            {
                char c = stringToSplit[i];
                if (c == ' ' || c == '\n')
                {
                    if (i - start > maximumLineLength)
                    {
                        string substring = stringToSplit.Substring(start, end - start); ;
                        start = removeSpace ? end + 1 : end; // + 1 to remove the space on the next line
                        yield return substring;
                    }
                    else
                        end = i;
                }
            }
            yield return stringToSplit.Substring(start); // remember last line
        }

Here is the example code used to test speeds (again, run on your own machine and test in Release mode to get accurate timings) https://dotnetfiddle.net/h5I1GC
Timings on my machine in release mode .Net 4.8

Accepted Answer: 667ms
Regex: 368ms
My Version: 117ms

My requirement was to have a line break at the last space before the 30 char limit. So here is how i did it. Hope this helps anyone looking.

 private string LineBreakLongString(string input)
        {
            var outputString = string.Empty;
            var found = false;
            int pos = 0;
            int prev = 0;
            while (!found)
                {
                    var p = input.IndexOf(' ', pos);
                    {
                        if (pos <= 30)
                        {
                            pos++;
                            if (p < 30) { prev = p; }
                        }
                        else
                        {
                            found = true;
                        }
                    }
                    outputString = input.Substring(0, prev) + System.Environment.NewLine + input.Substring(prev, input.Length - prev).Trim();
                }

            return outputString;
        }

An approach using recursive method and ReadOnlySpan (Tested)

public static void SplitToLines(ReadOnlySpan<char> stringToSplit, int index, ref List<string> values)
{
   if (stringToSplit.IsEmpty || index < 1) return;
   var nextIndex = stringToSplit.IndexOf(' ');
   var slice = stringToSplit.Slice(0, nextIndex < 0 ? stringToSplit.Length : nextIndex);

   if (slice.Length <= index)
   {
      values.Add(slice.ToString());
      nextIndex++;
   }
   else
   {
      values.Add(slice.Slice(0, index).ToString());
      nextIndex = index;
   }

   if (stringToSplit.Length <= index) return;
   SplitToLines(stringToSplit.Slice(nextIndex), index, ref values);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top