When you don't want to match the token INCH
when it's part of another word, you'll need to match words, and skip these:
WORD
: [a-zA-Z]+ -> skip
;
Just be sure you place this rule after your INCH
rule, otherwise it'd match the input "in"
as a word too (which you obviously don't want). You'll also want to expand the character this rule matches: only ascii letter won't suffice.
Also, [I|i]
matches the pipe char as well: do [Ii]
instead.
Although correct:
include_metric_units
: imperial_types
| include_metric_units imperial_types
;
it's rather LR/Bison-esque. More readable would be to write:
include_metric_units
: imperial_types+
;
And to match tokens that might be in the token stream, but are not matches by any of your productions, simply match any token in your top level rule:
parse
: ( include_metric_units // match metrics
| . // or any "dangling" single token
)* // zero or more times
EOF // end of the input
;
include_metric_units
: imperial_types+
;
Yes, that is correct: the .
(DOT) inside a production/parser rule matches a single token, not a single character. It only matches a single character in lexer rules.
When I now parse the input
A whiteboard with 1550 square inches of writing space, and
a touchscreen measuring 775 square inches and an in at the end...
(note the 'in'
at the end!), I get the following parse tree: