Regex group empty

https://stackoverflow.com/questions/23676263

c#
regex

23-07-2023
|

Question

This is my code:

private static Regex paginationRegex = new Regex("<div class=\"pagination\">.*?<ul>(?<lis>.*?)</ul></div>",
                            RegexOptions.Singleline | RegexOptions.IgnoreCase);

        static void Main(string[] args)
        {
            string output = File.ReadAllText("output.html");

            var match = paginationRegex.Match(output);

            var lis = match.Groups["lis"].Value;

        }

and this is my HTML in output.html:

<div class="pagination">
        <ul>
                <li><a href="javascript:searchPage('1')" class="arrowDeactiveLeftFirst"> </a></li>  
                            <li><a href="javascript:searchPage('1')" class="deActivateleftArrow"> </a></li>
                    <li>
                                    <a class="current" href="javascript:searchPage('1')">1</a>
                                </li>
          <li>
                                    <a href="javascript:searchPage('2')">2</a> 
                                </li>
          <li>
                                    <a href="javascript:searchPage('3')">3</a> 
                                </li>
                      <li><a href="javascript:searchPage('2')" class="rightArrow"> </a></li>
                          <li><a href="javascript:searchPage('730')" class="arrowRightLast"> </a></li>
              </ul>
      </div>

However the lis group is always empty. What am I missing?

Solution

I think this is just because you're not taking into account the space between the </ul> and the </div> at the end of your snippet. Allowing whitespace in between the two seems to fix the issue:

//                                                                                  \/
Regex paginationRegex = new Regex("<div class=\"pagination\">.*?<ul>(?<lis>.*?)</ul>\\s*</div>",
                        RegexOptions.IgnoreCase | RegexOptions.Singleline);

I'm also obliged to mention that regular expressions often aren't the best tool for parsing HTML. Check out Html Agility Pack for a good library that's great at parsing HTML.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow