문제

I am trying to implement a regex that has the capability to assign a same keyword or combination of keywords to one or multiple Named Groups.

For example I want to match ('aa' AND 'bb') OR 'cc' and assign 'aa' AND 'bb' to a group<1> and 'cc' to group<2>.

Also I can have a query like ('aa' AND 'bb') OR 'aa' and I want 'aa' AND 'bb' to be in group<1> and at the same time 'aa' to be in group<2>.

// Works to get 'aa' everywhere but cannot find a way to add 'bb' to the group<1>
(?=(?:\s+|^)(?<1>aa)(?:\s+|$)) 

EDIT :

Input Example : bb is nice but not without the missingaa
Output : Does not Validate, Group<1> is null | Group<2> is null

-

Input Example : bb is nice as well as aa
Output : Validate, Group<1> : bb is nice as well as aa | Group<2> is null

-

Input Example : bb is nice but not without the missingaa or cc
Output : Validate, Group<1> is null | Group<2> is cc

-

Input Example : bb is nice as well as aa or cc
Output : Validate, Group<1> is bb is nice as well as aa | Group<2> is cc

I know that the grouping might be complicated but I am looking to have Group<1> which is not null if aa and bb exist.

How can I achieve this behavior?

도움이 되었습니까?

해결책

As a point of reference, with most regex engines, Group matches don't accumulate like an array. Dot-Net is an exception which can do that (collections).

My apologies, you were right, it needs the alternation.
However, you have to force find the first OR c. This is done with a conditional lookahead. Good luck!

 # ^.*?(?:(?:(?<grp1>(?:\baa\b.*?\bbb\b|\bbb\b.*?\baa\b))(?(?=.*\b(?:cc|aa)\b).*(?<grp2>(?:\bcc\b|\baa\b))|))|(?<grp2>\b(?:cc|aa)\b))

  ^ 
  .*? 
  (?:
       (?:                           # Force find   a AND b, OR c
            (?<grp1>
                 (?:
                      \b aa \b .*? \b bb \b 
                   |  \b bb \b .*? \b aa \b 
                 )
            )
            (?(?=                  # conditional assertion, force to find 
                 .*
                 \b (?:  cc | aa  ) \b 
              )
                 .* 
                 (?<grp2>
                      \b (?:  cc | aa  ) \b
                 )
              |  
            )
       )
    |  
       (?<grp2>              # Else, forcc find   OR c
            \b (?:  cc | aa  ) \b 
       )
  )

Edit: This would match (aa cc), (bb)
But beware, the more permutations, the more complex. And that leads down the road to assertions, flags, condition's, all of which will slow performance and make maintaining somewhat tougher.

 # ^.*?(?:(?:(?<grp1>(?:\baa\b(?:(?!cc).)*?\bbb\b|\baa\b(?:(?!bb).)*?\bcc\b|\bbb\b(?:(?!cc).)*?\baa\b|\bbb\b(?:(?!aa).)*?\bcc\b))(?(?=.*\b(?:aa|bb|cc)\b).*(?<grp2>\b(?:aa|bb|cc)\b)|))|(?<grp2>\b(?:cc|aa)\b))

 ^ 
 .*? 
 (?:
      # Force find:   (aa bb), (cc)
      #               (aa cc), (bb)
      #               (bb aa), (cc)
      #               (bb cc), (aa)
      (?:
           (?<grp1>                                     # GROUP1 
                (?:
                     \b aa \b (?:(?!cc).)*? \b bb \b 
                   |
                     \b aa \b (?:(?!bb).)*? \b cc \b 
                   |
                     \b bb \b (?:(?!cc).)*? \b aa \b 
                   |
                     \b bb \b (?:(?!aa).)*? \b cc \b 
                )
           )

           (?(?=      # Conditional assertion, find   (aa), (bb), (cc) 
                .*
                \b (?:  aa | bb | cc ) \b 
             )
                # The condition is true, so consume it
                .* 
                (?<grp2>                                # GROUP2
                     \b (?:  aa | bb | cc ) \b
                )
             |  # The condition is false, match nothing  
           )

      )
   | 
      # Or, 
      # Force find:   (), (aa)
      #               (), (bb)
      #               (), (cc)

      (?<grp2>                      # GROUP2 
           \b (?:  aa | bb | cc ) \b
      )
 )
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top