I've got problem with fine tuning of regex
Question
i've got regex which was alright, but as it camed out doesn't work well in some situations
Keep eye on message preview cause message editor do some tricky things with "\"
[\[]?[\^%#\$\*@\-;].*?[\^%#\$\*@\-;][\]]
its task is to find pattern which in general looks like that
[ABA]
- A - char from set ^,%,#,$,*,@,-,;
- B - some text
- [ and ] are included in pattern
is expected to find all occurences of this pattern in test string
Black fox [#sample1#] [%sample2%] - [#sample3#] eats blocks.
but instead of expected list of matches
- "[#sample1#]"
- "[%sample2%]"
- "[#sample3#]"
I get this
- "[#sample1#]"
- "[%sample2%]"
- "- [#sample3#]"
And it seems that this problem will occur also with other chars in set "A". So could somebody suggest changes to my regex to make it work as i need?
and less important thing, how to make my regex to exclude patterns which look like that
[ABC]
- A - char from set ^,%,#,$,*,@,-,;
- B - some text
- C - char from set ^,%,#,$,*,@,-,; other than A
- [ and ] are included in pattern
for example
[$sample1#] [%sample2@] [%sample3;]
thanks in advance
MTH
Solution
\[([%#$*@;^-]).+?\1\]
applied to text:
Black fox [#sample1#] [%sample2%] - [#sample3#] [%sample4;] eats blocks.
matches
[#sample1#]
[%sample2%]
[#sample3#]
- but not
[%sample4;]
EDIT
This works for me (Output as expected, regex accepted by C# as expected):
Regex re = new Regex(@"\[([%#$*@;^-]).+?\1\]");
string s = "Black fox [#sample1#] [%sample2%] - [#sample3#] [%sample4;] eats blocks.";
MatchCollection mc = re.Matches(s);
foreach (Match m in mc)
{
Console.WriteLine(m.Value);
}
OTHER TIPS
Why the first "?" in "[[]?"
\[[\^%#\$\*@\-;].*?[\^%#\$\*@\-;]\]
would detect your different strings just fine
To be more precise:
\[([\^%#\$\*@\-;])([^\]]*?)(?=\1)([\^%#\$\*@\-;])\]
would detect [ABA]
\[([\^%#\$\*@\-;])([^\]]*?)(?!\1)([\^%#\$\*@\-;])\]
would detect [ABC]
You have an optional matching of the opening square bracket:
[\]]?
For the second part of you question (and to perhaps simplify) try this:
\[\%[^\%]+\%\]|\[\#[^\#]+\#\]|\[\$[^\$]+\$\]
In this case there is a sub pattern for each possible delimiter. The | character is "OR", so it will match if any of the 3 sub expressions match.
Each subexpression will:
- Opening bracket
- Special Char
- Everything that is not a special char (1)
- Special char
- Closing backet
(1) may need to add extra exclusions like ']' or '[' so it doesn't accidently match across a large body of text like:
[%MyVar#] blah blah [$OtherVar%]
Rob