Before fixing the problem with StackOverflowError
...
I would like to point out that your current regex
(\d\*?(;(?=\d))?)+
fails to validate this condition.Each value could include one * at the end (used as wildcard for other matching)
It fails to reject the case
23*4*4*;34*434*34
, as seen here1.Your regex will do unnecessary backtracking on an non-matching input.
Java uses one stack frame for each repetition of the group
(\d\*?(;(?=\d))?)
(which is repeated 1 or more time+
).
A correct regex would be:
\d+\*?(?:;\d+\*?)*
Note that this will reject *
, which is not too clear from your requirement whether you want to accept or reject this.
This doesn't fix the StackOverflow problem, since each repetition of the group (?:;\d+\*?)
is also going to use up stack. To fix that, make all quantifiers possessive, since there is no need for backtracking, as the grammar is not ambiguous:
\d++\*?+(?:;\d++\*?+)*+
Putting into string literal:
"\\d++\\*?+(?:;\\d++\\*?+)*+"
I have tested the regex above with matching and non matching input, which has more than 3600 tokens (separated by ;
).
Footnote
1: regex101 uses PCRE flavor, which is slightly different from Java regex flavor. However, the features used in your regex are common between them, so there should be no discrepancy.
Appendix
Actually, from my testing with your regex
(\d\*?(;(?=\d))?)+
(which is incorrect according to your requirement), making the outer most+
possessive++
seem to fix theStackOverflowError
problem, at least in my testing with around 3600 tokens (separated by;
, the string is around 20k character long). It also doesn't seem to cause long execution time when testing against a non-matching string.In my solution, make the
*
quantifier for the group(?:;\d+\*?)
possessive is enough to resolveStackOverflowError
."\\d+\\*?(?:;\\d+\\*?)*+"
However, I make everything possessive to be on the safe side.