Вопрос

How do I match an expression where I need to do an or of another set?

i.e., how do I match something of the format

[
  [
    [ a | b ] |
    [ x | y ]
  ]
]

where a, b, x and y are strings.

I want to match the phrases like

a
b
x
y
a x
a y
b x
b y
x a
x b
y a
y b

But not the ones like:

a b
x y
z z 

I'm trying to use it in Boost Xpressive so I have the option to use either ECMAScript or Perl type regular expressions.

Это было полезно?

Решение

You can do it like this:

[ab] [xy]|[xy] [ab]|[abxy]

There are 3 choices here:

  • Only a, b, x, y (single character)
  • Or 2 character, a or b comes before x or y, space in between.
  • Or 2 character, x or y comes before a or b, space in between.

I put [abxy] behind, just in case when you search, it will search for those in front (the paired up ones) before searching for the single ones. The order is important if you use the regex to search, but it doesn't matter much when you do validation.

Another way to write it:

[ab]( [xy])?|[xy]( [ab])?

That only works for character, but you can make it works for string. For example, let's say you have 4 strings s1, s2, s3, s4:

(s1|s2)( (s3|s4))?|(s3|s4)( (s1|s2))?

It searches for:

  • Either s1 or s2, may or may not (0 or 1 instance of) followed by s3 or s4
  • (The other way around)

This covers all the cases of s1, s2, etc. (single string), s2 s3, s3 s2, etc. (paired up, can reverse the order). The regex above will search for the longer version (paired up) before resorting to the single string, due to the default greedy property of quantifiers.

Note that I am using capturing groups (pattern) in the regex above, which will record the position of the string that matches the pattern inside. You can make them non-capturing group (?:pattern), if you don't need to refer to the text that matches the pattern. This will save you some clock cycles.

(?:s1|s2)(?: (?:s3|s4))?|(?:s3|s4)(?: (?:s1|s2))?

(I leave the task of changing capturing group to non-capturing group for the other regex as an exercise. It is as simple as adding ?:)

Searching or Validation?

If you want to find such pattern, then the regex above should work for you.

If you want to validate that the string matches the pattern, you need to use anchors ^ (match beginning of string), $ (match the end of the string) to make sure the string follows the exact format:

^([ab] [xy]|[xy] [ab]|[abxy])$
^([ab]( [xy])?|[xy]( [ab])?)$
^((s1|s2)( (s3|s4))?|(s3|s4)( (s1|s2))?)$
^(?:(?:s1|s2)(?: (?:s3|s4))?|(?:s3|s4)(?: (?:s1|s2))?)$

Note that I surround the regex from the above sections with () (capturing group, but I only need grouping here actually). This is because I have an alternation | inside.

Extensibility and Limitations

  • You can add more strings to either the first group or second group as you like:

    ^([abcd]( [xyz])?|[xyz]( [abcd])?)$
    
  • However, if you want to increase the number of groups, I suggest that you split the string by spaces and loop through the tokens to find the permutations of the group, rather than using regex.

Другие советы

There isn't a convenient way to do this without repeating a, b, x and y in the regular expression, but this problem can be alleviated by building the expression from pre-declared sub-expressions.

This code demonstrates. Note that the first three lines of the DATA are invalid, and they aren't reproduced in the output.

use v5.10;
use warnings;

my $ab = qr/a|b/;
my $xy = qr/x|y/;

my $re = qr/^
(?:
  $ab (?: \s+ $xy)? | $xy (?: \s+ $ab)?
)
$/x;

while (<DATA>) {
  print if /$re/;
}


__DATA__
a b
x y
z z
a
b
x
y
a x
a y
b x
b y
x a
x b
y a 
y b

output

a
b
x
y
a x
a y
b x
b y
x a
x b
y b

Try this:

^((a|b)( x| y)?|(x|y)( a| b)?)$

Anatomy of the regex:

# ^            - Line start
# (            - Group start
# (a|b)( x| y) - Match A or B followed by X or Y
# ?            - Where (X|Y) is optional
# |            - Or
# (x|y)( a| b) - Match X or Y followed by A or B
# ?            - Where (A|B) is optional
# )            - And group
# $            - End of line.

This matches:

a    y
b    a x
x    y b

But not:

a b
x y
z z

Simple:

(a|b|x|y|((a|b) (x|y))|((x|y) (a|b)))
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top