Question

Does C# have built-in support for parsing strings of page numbers? By page numbers, I mean the format you might enter into a print dialog that's a mixture of comma and dash-delimited.

Something like this:

1,3,5-10,12

What would be really nice is a solution that gave me back some kind of list of all page numbers represented by the string. In the above example, getting a list back like this would be nice:

1,3,5,6,7,8,9,10,12

I just want to avoid rolling my own if there's an easy way to do it.

Was it helpful?

Solution

Should be simple:

foreach( string s in "1,3,5-10,12".Split(',') ) 
{
    // try and get the number
    int num;
    if( int.TryParse( s, out num ) )
    {
        yield return num;
        continue; // skip the rest
    }

    // otherwise we might have a range
    // split on the range delimiter
    string[] subs = s.Split('-');
    int start, end;

    // now see if we can parse a start and end
    if( subs.Length > 1 &&
        int.TryParse(subs[0], out start) &&
        int.TryParse(subs[1], out end) &&
        end >= start )
    {
        // create a range between the two values
        int rangeLength = end - start + 1;
        foreach(int i in Enumerable.Range(start, rangeLength))
        {
            yield return i;
        }
    }
}

Edit: thanks for the fix ;-)

OTHER TIPS

It doesn't have a built-in way to do this, but it would be trivial to do using String.Split.

Simply split on ',' then you have a series of strings that represent either page numbers or ranges. Iterate over that series and do a String.Split of '-'. If there isn't a result, it's a plain page number, so stick it in your list of pages. If there is a result, take the left and right of the '-' as the bounds and use a simple for loop to add each page number to your final list over that range.

Can't take but 5 minutes to do, then maybe another 10 to add in some sanity checks to throw errors when the user tries to input invalid data (like "1-2-3" or something.)

Keith's approach seems nice. I put together a more naive approach using lists. This has error checking so hopefully should pick up most problems:-

public List<int> parsePageNumbers(string input) {
  if (string.IsNullOrEmpty(input))
    throw new InvalidOperationException("Input string is empty.");

  var pageNos = input.Split(',');

  var ret = new List<int>();
  foreach(string pageString in pageNos) {
    if (pageString.Contains("-")) {
      parsePageRange(ret, pageString);
    } else {
      ret.Add(parsePageNumber(pageString));
    }
  }

  ret.Sort();
  return ret.Distinct().ToList();
}

private int parsePageNumber(string pageString) {
  int ret;

  if (!int.TryParse(pageString, out ret)) {
    throw new InvalidOperationException(
      string.Format("Page number '{0}' is not valid.", pageString));
  }

  return ret;
}

private void parsePageRange(List<int> pageNumbers, string pageNo) {
  var pageRange = pageNo.Split('-');

  if (pageRange.Length != 2)
    throw new InvalidOperationException(
      string.Format("Page range '{0}' is not valid.", pageNo));

  int startPage = parsePageNumber(pageRange[0]),
    endPage = parsePageNumber(pageRange[1]);

  if (startPage > endPage) {
    throw new InvalidOperationException(
      string.Format("Page number {0} is greater than page number {1}" +
      " in page range '{2}'", startPage, endPage, pageNo));
  }

  pageNumbers.AddRange(Enumerable.Range(startPage, endPage - startPage + 1));
}

Below is the code I just put together to do this.. You can enter in the format like.. 1-2,5abcd,6,7,20-15,,,,,,

easy to add-on for other formats

    private int[] ParseRange(string ranges)
    { 
        string[] groups = ranges.Split(',');
        return groups.SelectMany(t => GetRangeNumbers(t)).ToArray();
    }

    private int[] GetRangeNumbers(string range)
    {
        //string justNumbers = new String(text.Where(Char.IsDigit).ToArray());

        int[] RangeNums = range
            .Split('-')
            .Select(t => new String(t.Where(Char.IsDigit).ToArray())) // Digits Only
            .Where(t => !string.IsNullOrWhiteSpace(t)) // Only if has a value
            .Select(t => int.Parse(t)) // digit to int
            .ToArray();
        return RangeNums.Length.Equals(2) ? Enumerable.Range(RangeNums.Min(), (RangeNums.Max() + 1) - RangeNums.Min()).ToArray() : RangeNums;
    }

Here's something I cooked up for something similar.

It handles the following types of ranges:

1        single number
1-5      range
-5       range from (firstpage) up to 5
5-       range from 5 up to (lastpage)
..       can use .. instead of -
;,       can use both semicolon, comma, and space, as separators

It does not check for duplicate values, so the set 1,5,-10 will produce the sequence 1, 5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

public class RangeParser
{
    public static IEnumerable<Int32> Parse(String s, Int32 firstPage, Int32 lastPage)
    {
        String[] parts = s.Split(' ', ';', ',');
        Regex reRange = new Regex(@"^\s*((?<from>\d+)|(?<from>\d+)(?<sep>(-|\.\.))(?<to>\d+)|(?<sep>(-|\.\.))(?<to>\d+)|(?<from>\d+)(?<sep>(-|\.\.)))\s*$");
        foreach (String part in parts)
        {
            Match maRange = reRange.Match(part);
            if (maRange.Success)
            {
                Group gFrom = maRange.Groups["from"];
                Group gTo = maRange.Groups["to"];
                Group gSep = maRange.Groups["sep"];

                if (gSep.Success)
                {
                    Int32 from = firstPage;
                    Int32 to = lastPage;
                    if (gFrom.Success)
                        from = Int32.Parse(gFrom.Value);
                    if (gTo.Success)
                        to = Int32.Parse(gTo.Value);
                    for (Int32 page = from; page <= to; page++)
                        yield return page;
                }
                else
                    yield return Int32.Parse(gFrom.Value);
            }
        }
    }
}

You can't be sure till you have test cases. In my case i would prefer to be white space delimited instead of comma delimited. It make the parsing a little more complex.

    [Fact]
    public void ShouldBeAbleToParseRanges()
    {
        RangeParser.Parse( "1" ).Should().BeEquivalentTo( 1 );
        RangeParser.Parse( "-1..2" ).Should().BeEquivalentTo( -1,0,1,2 );

        RangeParser.Parse( "-1..2 " ).Should().BeEquivalentTo( -1,0,1,2 );
        RangeParser.Parse( "-1..2 5" ).Should().BeEquivalentTo( -1,0,1,2,5 );
        RangeParser.Parse( " -1  ..  2 5" ).Should().BeEquivalentTo( -1,0,1,2,5 );
    }

Note that Keith's answer ( or a small variation) will fail the last test where there is whitespace between the range token. This requires a tokenizer and a proper parser with lookahead.

namespace Utils
{
    public class RangeParser
    {

        public class RangeToken
        {
            public string Name;
            public string Value;
        }

        public static IEnumerable<RangeToken> Tokenize(string v)
        {
            var pattern =
                @"(?<number>-?[1-9]+[0-9]*)|" +
                @"(?<range>\.\.)";

            var regex = new Regex( pattern );
            var matches = regex.Matches( v );
            foreach (Match match in matches)
            {
                var numberGroup = match.Groups["number"];
                if (numberGroup.Success)
                {
                    yield return new RangeToken {Name = "number", Value = numberGroup.Value};
                    continue;
                }
                var rangeGroup = match.Groups["range"];
                if (rangeGroup.Success)
                {
                    yield return new RangeToken {Name = "range", Value = rangeGroup.Value};
                }

            }
        }

        public enum State { Start, Unknown, InRange}

        public static IEnumerable<int> Parse(string v)
        {

            var tokens = Tokenize( v );
            var state = State.Start;
            var number = 0;

            foreach (var token in tokens)
            {
                switch (token.Name)
                {
                    case "number":
                        var nextNumber = int.Parse( token.Value );
                        switch (state)
                        {
                            case State.Start:
                                number = nextNumber;
                                state = State.Unknown;
                                break;
                            case State.Unknown:
                                yield return number;
                                number = nextNumber;
                                break;
                            case State.InRange:
                                int rangeLength = nextNumber - number+ 1;
                                foreach (int i in Enumerable.Range( number, rangeLength ))
                                {
                                    yield return i;
                                }
                                state = State.Start;
                                break;
                            default:
                                throw new ArgumentOutOfRangeException();
                        }
                        break;
                    case "range":
                        switch (state)
                        {
                            case State.Start:
                                throw new ArgumentOutOfRangeException();
                                break;
                            case State.Unknown:
                                state = State.InRange;
                                break;
                            case State.InRange:
                                throw new ArgumentOutOfRangeException();
                                break;
                            default:
                                throw new ArgumentOutOfRangeException();
                        }
                        break;
                    default:
                        throw new ArgumentOutOfRangeException( nameof( token ) );
                }
            }
            switch (state)
            {
                case State.Start:
                    break;
                case State.Unknown:
                    yield return number;
                    break;
                case State.InRange:
                    break;
                default:
                    throw new ArgumentOutOfRangeException();
            }
        }
    }
}

One line approach with Split and Linq

string input = "1,3,5-10,12";
IEnumerable<int> result = input.Split(',').SelectMany(x => x.Contains('-') ? Enumerable.Range(int.Parse(x.Split('-')[0]), int.Parse(x.Split('-')[1]) - int.Parse(x.Split('-')[0]) + 1) : new int[] { int.Parse(x) });

Here's a slightly modified version of lassevk's code that handles the string.Split operation inside of the Regex match. It's written as an extension method and you can easily handle the duplicates problem using the Disinct() extension from LINQ.

    /// <summary>
    /// Parses a string representing a range of values into a sequence of integers.
    /// </summary>
    /// <param name="s">String to parse</param>
    /// <param name="minValue">Minimum value for open range specifier</param>
    /// <param name="maxValue">Maximum value for open range specifier</param>
    /// <returns>An enumerable sequence of integers</returns>
    /// <remarks>
    /// The range is specified as a string in the following forms or combination thereof:
    /// 5           single value
    /// 1,2,3,4,5   sequence of values
    /// 1-5         closed range
    /// -5          open range (converted to a sequence from minValue to 5)
    /// 1-          open range (converted to a sequence from 1 to maxValue)
    /// 
    /// The value delimiter can be either ',' or ';' and the range separator can be
    /// either '-' or ':'. Whitespace is permitted at any point in the input.
    /// 
    /// Any elements of the sequence that contain non-digit, non-whitespace, or non-separator
    /// characters or that are empty are ignored and not returned in the output sequence.
    /// </remarks>
    public static IEnumerable<int> ParseRange2(this string s, int minValue, int maxValue) {
        const string pattern = @"(?:^|(?<=[,;]))                      # match must begin with start of string or delim, where delim is , or ;
                                 \s*(                                 # leading whitespace
                                 (?<from>\d*)\s*(?:-|:)\s*(?<to>\d+)  # capture 'from <sep> to' or '<sep> to', where <sep> is - or :
                                 |                                    # or
                                 (?<from>\d+)\s*(?:-|:)\s*(?<to>\d*)  # capture 'from <sep> to' or 'from <sep>', where <sep> is - or :
                                 |                                    # or
                                 (?<num>\d+)                          # capture lone number
                                 )\s*                                 # trailing whitespace
                                 (?:(?=[,;\b])|$)                     # match must end with end of string or delim, where delim is , or ;";

        Regex regx = new Regex(pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled);

        foreach (Match m in regx.Matches(s)) {
            Group gpNum = m.Groups["num"];
            if (gpNum.Success) {
                yield return int.Parse(gpNum.Value);

            } else {
                Group gpFrom = m.Groups["from"];
                Group gpTo = m.Groups["to"];
                if (gpFrom.Success || gpTo.Success) {
                    int from = (gpFrom.Success && gpFrom.Value.Length > 0 ? int.Parse(gpFrom.Value) : minValue);
                    int to = (gpTo.Success && gpTo.Value.Length > 0 ? int.Parse(gpTo.Value) : maxValue);

                    for (int i = from; i <= to; i++) {
                        yield return i;
                    }
                }
            }
        }
    }

The answer I came up with:

static IEnumerable<string> ParseRange(string str)
{
    var numbers = str.Split(',');

    foreach (var n in numbers)
    {
       if (!n.Contains("-")) 
           yield return n;
       else
       {
           string startStr = String.Join("", n.TakeWhile(c => c != '-'));
           int startInt = Int32.Parse(startStr);

           string endStr = String.Join("", n.Reverse().TakeWhile(c => c != '-').Reverse());
           int endInt = Int32.Parse(endStr);

           var range = Enumerable.Range(startInt, endInt - startInt + 1)
                                 .Select(num => num.ToString());

           foreach (var s in range)
               yield return s;
        }
    }
}

Regex is not efficient as following code. String methods are more efficient than Regex and should be used when possible.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] inputs = {
                                 "001-005/015",
                                 "009/015"
                             };

            foreach (string input in inputs)
            {
                List<int> numbers = new List<int>();
                string[] strNums = input.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
                foreach (string strNum in strNums)
                {
                    if (strNum.Contains("-"))
                    {
                        int startNum = int.Parse(strNum.Substring(0, strNum.IndexOf("-")));
                        int endNum = int.Parse(strNum.Substring(strNum.IndexOf("-") + 1));
                        for (int i = startNum; i <= endNum; i++)
                        {
                            numbers.Add(i);
                        }
                    }
                    else
                        numbers.Add(int.Parse(strNum));
                }
                Console.WriteLine(string.Join(",", numbers.Select(x => x.ToString())));
            }
            Console.ReadLine();

        }
    }
}

My solution:

  • return list of integers
  • reversed/typo/duplicate possible: 1,-3,5-,7-10,12-9 => 1,3,5,7,8,9,10,12,11,10,9 (used when you want to extract, repeat pages)
  • option to set total of pages: 1,-3,5-,7-10,12-9 (Nmax=9) => 1,3,5,7,8,9,9
  • autocomplete: 1,-3,5-,8 (Nmax=9) => 1,3,5,6,7,8,9,8

        public static List<int> pageRangeToList(string pageRg, int Nmax = 0)
    {
        List<int> ls = new List<int>();
        int lb,ub,i;
        foreach (string ss in pageRg.Split(','))
        {
            if(int.TryParse(ss,out lb)){
                ls.Add(Math.Abs(lb));
            } else {
                var subls = ss.Split('-').ToList();
                lb = (int.TryParse(subls[0],out i)) ? i : 0;
                ub = (int.TryParse(subls[1],out i)) ? i : Nmax;
                ub = ub > 0 ? ub : lb; // if ub=0, take 1 value of lb
                for(i=0;i<=Math.Abs(ub-lb);i++) 
                    ls.Add(lb<ub? i+lb : lb-i);
            }
        }
        Nmax = Nmax > 0 ? Nmax : ls.Max(); // real Nmax
        return ls.Where(s => s>0 && s<=Nmax).ToList();
    }
    
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top