Question

So I currently have this which I believe works but it is quite long. I'm using C# regex.

^(:?J)$|^(:?J)$|^(:?F)$|^(:?M)$|^(?:A)$|^(?:A)$|^(?:S)$|^(?:O)$|^(?:N)$|^(?:D)$|^(:?JA)$|^(:?JU)$|^(:?FE)$|^(:?MA)$|^(?:AP)$|^(?:AU)$|^(?:SE)$|^(?:OC)$|^(?:NO)$|^(?:DE)$|^(:?JAN)$|^(:?FEB)$|^(:?MAR)$|^(:?APR)$|^(?:MAY)$|^(?:JUN)$|^(?:JUL)$|^(?:AUG)$|^(?:SEP)$|^(?:OCT)$|^(?:NOV)$|^(?:DEC)$

Is there any way to make this shorter? I think it is already pretty straightforward but if there is a way to combine what I have here into a shorter regex that is what I am after.

I need it to match the combinations of first letter only, first and second, and all three letters of the month abbreviations.

First Letter only. ^(:?J)$|^(:?J)$|^(:?F)$|^(:?M)$|^(?:A)$|^(?:A)$|^(?:S)$|^(?:O)$|^(?:N)$|^(?:D)$

First and Second letter combinations are matched by this. ^(:?JA)$|^(:?JU)$|^(:?FE)$|^(:?MA)$|^(?:AP)$|^(?:AU)$|^(?:SE)$|^(?:OC)$|^(?:NO)$|^(?:DE)$

Full abbreviations: |^(:?JAN)$|^(:?FEB)$|^(:?MAR)$|^(:?APR)$|^(?:MAY)$|^(?:JUN)$|^(?:JUL)$|^(?:AUG)$|^(?:SEP)$|^(?:OCT)$|^(?:NOV)$|^(?:DEC)$

Then I combined those regexes into the one I have at the top... which now works as I intend it to However it is still rather huge and I imagine I can be improved.

Était-ce utile?

La solution

First, i want to inform you that your regex has no sense. Please go here and here for more informations.

For your problem, you can try this:

J(AN?)?|F(EB?)?|M(AR?)?|...

or better with non capturing groups:

J(?:AN?)?|F(?:EB?)?|M(?:AR?)?|...

You don't need to use any character class here, but you can use alternations, groups, and question mark quantifiers.

If you want to match the begining and the end of the string, you can write it like this

^(?:J(?:AN?)?|F(?:EB?)?|M(?:AR?)?|...)$

For more performances you can use this pattern that use atomic groups and possessive quantifiers:

^(?>J(?>AN?+)?|F(?>EB?+)?|M(?>AR?+)?|...|D(?>EC?+)?)$

and you can play with the names of months like this for quick fails:

^(?>J(?>AN|U[NL]?+)?|F(?>EV?+)?|M(?>A[RI]?+)?|A(?>PR?+|UG?+)?|S(?>EP?+)?|O(?>CT?+)?|N(?>OV?+)?|D(?>EC?+)?)$

What the regex engine do? Example with the last pattern:

My example string is AU (for AUGUSTUS)

 ^(?>             # an atomic group is at the begining of the pattern
                  # In this group there is an alternation J...|F...|M... 
                  # until the closing parenthesis of the atomic groups
 )$               # at the end of the string

What the regex engine try:

^   ^   # the re is on the begining of the string, all is fine
A   J   # the re try with J and fails then goes to the next alternation
A   F   # test F and fails ...
A   M   # ...
A   A   # test A and succeeds
U   P   # test P and fails and goes to the next alternation
U   U   # test U and succeeds
$   G   # don't test anything the re know that it is the end of the string!
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top