The parser returned by pattern
only looks at a single character. Have a look at the tests for some examples.
A first approximation of the regular expression \b\w+\b
would be:
word().neg() & word().plus() & word().not()
However, this requires a non-word character at the beginning of the parsed string. You can avoid this problem by removing word().neg()
and making sure that the caller starts at a valid place.
The problem you describe is common when using parsing expression grammars. You can typically solve it by reordering the choices accordingly, or by using the logical predicates like and()
and not()
. For example the Smalltalk grammar defines the token true
as follows:
def('trueToken', _token('true') & word().not());
This avoids that the token parser accidentally consumes part of a variable called trueblood
.