How to match `\b` in regex in PetitParserDart?

https://stackoverflow.com/questions/17428108

02-06-2022
|

Question

\b is the "world boundary" in regular expression, how to match it in PetitParserDart?

I tried:

pattern("\b") & word().plus() & pattern("\b")

But it doesn't match anything. The patten above I want is \b\w+\b in regular expression.

My real problem is:

I want to treat the render as a token, only if it's a standalone word.

Following is true:

render
to render the page
render()
@render[it]

Following is not:

rerender
rendering
render123

I can't use string("render").trim() here since it will eat up the spaces around it. So I want the \b but it seems not be supported by PetitParserDart.

Solution

The parser returned by pattern only looks at a single character. Have a look at the tests for some examples.

A first approximation of the regular expression \b\w+\b would be:

word().neg() & word().plus() & word().not()

However, this requires a non-word character at the beginning of the parsed string. You can avoid this problem by removing word().neg() and making sure that the caller starts at a valid place.

The problem you describe is common when using parsing expression grammars. You can typically solve it by reordering the choices accordingly, or by using the logical predicates like and() and not(). For example the Smalltalk grammar defines the token true as follows:

def('trueToken', _token('true') & word().not());

This avoids that the token parser accidentally consumes part of a variable called trueblood.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow