Question

i want to capture only the first match through the expression

<p>.*?</p>

i have tried <p>.*?</p>{1} but it is not working it returns all the p tags which are in the html document, please help

Was it helpful?

Solution

It looks like you are using a method which returns every match in the string given a regex, that being the case you need to anchor the regex to the beggining of the string so it doesn't return every match, but only the first one:

^.*?<p>.*?</p>

Use parentheses to capture what you want to capture.

PS: Here goes the standard 'avoid using regex to parse HTML, use a proper HTML parser' advice. This simple regex will fail for nested <p> sections (which I don't recall if are valid in HTML, but still you can probably get them even if they aren't).

OTHER TIPS

The Regex.Match method does this by default, and the regular expression is correct.

Regex regex = new Regex("<p>(.*?)</p>");
Match match = regex.Match("<p>1</p><p>2</p>");
Console.WriteLine("{0}", match.Value);

Running this program will print 1.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top