Question

I'm using wrappers from Byte Comb (http://bytecomb.com/regular-expressions-in-vba/). They seem to be working very well. I need help formulating robust patterns.

I experience unexpected results when combining lookahead "(?=)" with or "|".

Input Text String           Pattern                 RxMatch
-----------------           -------                 -------
iraq                q(?!u)                  q
quit                q(?!u)                  0
iraq                q(?=u)                  0
quit                q(?=u)                  q
sta.23.5  .1 words 67.89  ch    \d+\.?\d*|\.\d+(?=\s*ch)            23.5
sta.23.5  .1 words 67.89  ch    (\d+\.?\d*)|(\.\d+)(?=\s*ch)        23.5
sta.23.5  .1 words 67.89  ch    \d+\.?\d*(?=\s*ch)              67.89
sta.23.5  .1 words 67.89  ch    \d+\.?\d*(?=\s*ch)|\.\d+(?=\s*ch)       67.89
sta.23.5  .1 words .89  ch      \d+\.?\d*|\.\d+(?=\s*ch)            23.5
sta.23.5  .1 words .89  ch      (\d+\.?\d*)|(\.\d+)(?=\s*ch)        23.5
sta.23.5  .1 words .89  ch      \d+\.?\d*(?=\s*ch)              89
sta.23.5  .1 words .89  ch      \d+\.?\d*(?=\s*ch)|\.\d+(?=\s*ch)       .89

"iraq" and "quit" work as expected. For the next set of input text strings, I hope to extract "67.89", and for the third, ".89". Initially, I formulated \d+.?\d*|.\d+ for floating decimal number to cover both situations. Adding parenthesis did not help. Removing the or helped for 67.89. Finally I found a working solution. But is there something better? Can you help me understand order of precedence? If possible, I'd like to keep the two parts of the or together.

Thanks, Not-a-programmer!

Was it helpful?

Solution

\d+\.?\d*|\.\d+(?=\s*ch) applied to "sta.23.5 .1 words 67.89 ch" captures 23.5 first because it matches \d+\.?\d*

The Or "|" has the highest precedence, if you want to think of it like that, splitting the pattern into two possible matches: \d+\.?\d* and \.\d+(?=\s*ch)

If you want to prevent \d+\.?d* from matching 23.5, you would have to add an extra criteria, like requiring a space character before it and using capturing parenthesis to get the number as a submatch: \s(\d+\.?\d*)

You could match both with this pattern, \s(\d+\.?\d*)|\.\d+(?=\s*ch), but keep in mind that if the first half matches you would be looking at the submatches for the actual value.

The real problem here is that VBScript's RegExp class doesn't support lookbehind, just lookahead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top