Returning only the first match using Regex Look-Behinds

https://stackoverflow.com/questions/16152535

11-04-2022
|

Question

Given the following XML document:

<root>
    <myGoodSection 
          some="attr" 
          another="attr" 
      />
    <myBadSection yet="anotherattr" />
</root>

How can I return the first /> using Regex? So far I've been able to get pretty close using the following expression:

(?ims)(?<=<myGoodSection.*?)/>

However, this will match every instance of /> that follows the first occurrence of <myGoodSection. I've also tried combining it with a negative look-behind in an effort to make the expression non-greedy, but it does not seem to have any effect:

(?ims)(?<=<myGoodSection.*?)(?<!/>)/>

Edit:

I am using a tool built on top of C# to handle the regex replacement. I do not have any control over how many matches I can use or not use like if I was using System.Text.RegularExpressions directly. I reference C# here to clarify the features that the engine I am using supports.

Yes, I am aware that as a matter of general practice I should not be using RegEx to parse XML. Let's just stipulate that given my current scope, requirements, and constraints that it is a perfectly acceptable solution (providing there's actually a way to accomplish it).

Solution

I was able to accomplish this by replacing . with \b[^>] so that my final expression becomes:

(?ims)(?<=<myGoodSection\b[^>]*?)/>

That will only match the closing /> as long as the prefix does not contain > anywhere, which will then exclude all of the tags following the first match.

OTHER TIPS

First off you shouldn't use Regex to parse XML.

With that aisde, you can have it only return the first match using Regex.Match().

Also, if your regex is simply returning too much, you could use non-greedy selection, like so:

(?ims)(?<=<myGoodSection.*?)/>

Note the ? after the *.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow