Question

This is very simple regex yet, it runs for over 30 seconds on a very short string: (i7 3970k @ 3.4ghz)

Pattern compile = Pattern.compile("^(?=[a-z0-9-]{1,63})([a-z0-9]+[-]{0,1}){1,63}[a-z0-9]{1}$");
Matcher matcher = compile.matcher("test-metareg-rw40lntknahvpseba32cßáàâåäæç.nl");
boolean matches = matcher.matches(); //Takes 30+ seconds

First part the (?=) is assertion that the string contains at max these characters

The 2nd part is assertion that the string doesn't exceed syntax for example on this case to prevent --'s and end at least in [a-z0-9]

Was it helpful?

Solution

I tried to guess your intention but it was not easy:

(?=[a-z0-9-]{1,63}) this look-ahead seem to intend to require the next up to 63 characters to be lowercase ASCII letters or numbers, but in fact, it will succeed even if there’s only one letter followed by anything. So maybe you meant (?=[a-z0-9-]{1,63}$) to forbid anything else after the legal up to 63 characters.

You seem to want groups of at least one letter or number between the - but you made the - optional not really creating a constraint and allowing way to much possibilities which created the overhead of your expression. You can simply say: ([a-z0-9]++-){0,63}[a-z0-9]+. The groups within the braces require at least one letter or number and require the minus after that, the expression at the end requires at least one letter or number at the end of the expression but will also match the last group without a following - at the same time. This last group might also be the only one if no - is contained in your text at all.

Putting it all together you regex becomes: (?=[a-z0-9-]{1,63}$)([a-z0-9]++-){0,63}[a-z0-9]+. Note that you don’t need a leading ^ or trailing $ if you use the matches method; it already implies that the string bounds must match the expression bounds.

I hope I got your intention right…

OTHER TIPS

I have fixed this regex replacing it as follows:

^(?=[a-z0-9-]{1,63})([a-z0-9]{0,1}|[-]{0,1}){1,63}[a-z0-9]{1}$

The section ([a-z0-9]+[-]{0,1}){1,63} became: ([a-z0-9]{0,1}|[-]{0,1}){1,63}

  • If you want to make sure that there is no -- in your string just use negative look ahead (?!.*--).
  • Also there is no point in writing {1}.
  • Another thing is if you want to ensure that string has max 63 characters then in your look-ahead you need to add $ at the end (?=[a-z0-9-]{1,63}$).

So maybe ^(?=[a-z0-9-]{1,63}$)(?!.*--)[a-z0-9-]+[a-z0-9]$

I think from what you say, your regex can be simplified to this
Edit - (For posterity) After reading @Holger's post, I am changing this to fix possible catastrophic backtracking, and to speed it up, which as my benches show is possibly the fastest way to do it.

 #  ^(?=[a-z0-9-]{1,63}$)[a-z0-9]++(?:-[a-z0-9]+)*+$

 ^                                    # BOL
 (?= [a-z0-9-]{1,63} $ )              # max 1 - 63 of these characters
 [a-z0-9]++ (?: - [a-z0-9]+ )*+       # consume the characters in this order
 $                                    # EOL
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top