This is my first crack at parser generators, and, consequently ANTLR. I'm using ANTLR v4 trying to generate a simple practice parser for Morse Code with the following extra rules:
- A letter (e.g.,
...
[the letter 's']) can be denoted as capitalized if a '^' precedes it
- ex.:
^...
denotes a capital 'S'
- Special characters can be embeded in parentheses
- Each encoded entity will be separated by whitespace
So I could encode the following sentence:
ABC a@b.com
as (with corresponding letters shown underneath):
^.- ^-... ^-.-. ( ) ._ (@) -... (.) -.-. --- --
A B C ' ' a '@' b '.' c o m
Particularly note the two following entities: ( )
(which denotes a space) and (.)
(which denotes a period.
There is mainly one things that I'm finding hard to wrap my head around: The same token can take on different meanings depending on whether it is in parentheses or not. That is, I want to tell ANTLR that I want to discard whitespace, yet not in the ( )
case. Also, a Morse Code character can consist of dots-and-dashes (periods-and-dashes), yet, I don't want to consider the period in (.)
as "any charachter".
Here is the grammar I have got so far:
grammar MorseCode;
file: entity*;
entity:
special
| morse_char;
special: '(' SPECIAL ')';
morse_char: '^'? (DOT_OR_DASH)+;
SPECIAL : .; // match any character
DOT_OR_DASH : ('.' | '-');
WS : [ \t\r\n]+ -> skip; // we don't care about whitespace (or do we?)
When I try it against the following input:
^... --- ...(@)
I get the following output (from grun ... -tokens
):
[@0,0:0='^',<1>,1:0]
[@1,1:1='.',<4>,1:1]
...
[@15,15:14='<EOF>',<-1>,1:15]
line 1:1 mismatched input '.' expecting DOT_OR_DASH
It seems there is trouble with ambiguity between SPECIAL
and DOT_OR_DASH
?