Question

Using C# RegEx, I am trying to find text enclosed by two distinct pairs of words, say, start1....end1, and start2...end2. In my example below I would like to get: text1, text2, text11, text22.

string str = "This start1 text1 end1. And start2 text2 end2 is a test. This start1 text11 end1. And start2 text22 end2 is a test.";

Regex oRegEx = new Regex(@"start1(.*?)end1|start2(.*?)end2", RegexOptions.IgnoreCase);
MatchCollection oMatches = oRegEx.Matches(sHTML);
if (oMatches.Count > 0)
{
    foreach (Match mt in oMatches)
    {
        Console.WriteLine(mt.Value);     //the display includes the start1 and end1 (or start2 and end2)
        Console.WriteLine(mt.Groups[1].Value); //the display excludes the start1 and end1 (or start2 and end2) or displays an empty string depending on the order of pattern.
    }
}

mt.Groups[1].Value in the above code correctly displays text1, text11 if the pattern is @"start1(.*?)end1|start2(.*?)end2" but it displays empty strings for text2, and text22. On the other hand if I change order in the pattern to @"start2(.*?)end2|start1(.*?)end1", it correctly displays text2, text22 but displays empty strings for text1 and text11. What needs to change in my code? This MSDN article explains something about when a group returns empty string but I am still not getting the desired results.

Was it helpful?

Solution

Give name to group.

start1(?<val>.*?)end1|start2(?<val>.*?)end2

And get value as:

mt.Groups["val"].Value

The original problem is that without names the group between start1 and end1 has index 1, and group between start2 and end2 has index 2, as you can see from the following picture: Regular expression visualization

Or another solution is to use regex like:

(?<=start([12])).*?(?=end\1)

Regular expression visualization

Debuggex Demo

And then in your code:

Console.WriteLine(mt.Value);

will display the required content.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top