According to the Wikipedia page, MUMPS doesn't have reserved words:
Reserved words: None. Since MUMPS interprets source code by context, there is no need for reserved words. You may use the names of language commands as variables.
Lexer rules like Command_KILL
function exactly like reserved words: they're designed to make sure no other token is generated when input "kill"
is encountered. So token type Command_KILL
will always be produced on "kill"
, even if it's intended to be an identifier. You can keep the command lexer rules if you want, but you'll have to treat them like IDs as well because you just don't know what "kill"
refers to based on the token alone.
Making a MUMPS implementation in ANTLR means focusing on token usage and context rather than token types. Consider this grammar:
grammar Example;
document : (expr (EOL|EOF))+;
expr : command=ID Space+ value (Space* COMMA Space* value)* #CallExpr
| command=ID Space+ name=ID Space* Equal Space* value #SetExpr
;
value : ID | INT;
INT : [0-9]+;
ID : [a-zA-Z][a-zA-Z0-9]*;
Space : ' ';
Equal : '=';
EOL : [\r\n]+;
COMMA : ',';
Parser rule expr
knows when an ID
token is a command based on the layout of the entire line.
- If the input tokens are
ID ID
, then the input is aCallExpr
: the firstID
is a command name and the secondID
is a regular identifier. - If the input tokens are
ID ID Equal ID
, then the input is aSetExpr
: the firstID
will be a command (either"set"
or something like it), the secondID
is the target identifier, and the thirdID
is the source identifier.
Here's a Java test application followed by a test case similar to the one mentioned in your question.
import java.util.List;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
public class ExampleTest {
public static void main(String[] args) {
ANTLRInputStream input = new ANTLRInputStream(
"new new, set, kill\nset kill = new");
ExampleLexer lexer = new ExampleLexer(input);
ExampleParser parser = new ExampleParser(new CommonTokenStream(lexer));
parser.addParseListener(new ExampleBaseListener() {
@Override
public void exitCallExpr(ExampleParser.CallExprContext ctx) {
System.out.println("Call:");
System.out.printf("\tcommand = %s%n", ctx.command.getText());
List<ExampleParser.ValueContext> values = ctx.value();
if (values != null) {
for (int i = 0, count = values.size(); i < count; ++i) {
ExampleParser.ValueContext value = values.get(i);
System.out.printf("\targ[%d] = %s%n", i,
value.getText());
}
}
}
@Override
public void exitSetExpr(ExampleParser.SetExprContext ctx) {
System.out.println("Set:");
System.out.printf("\tcommand = %s%n", ctx.command.getText());
System.out.printf("\tname = %s%n", ctx.name.getText());
System.out.printf("\tvalue = %s%n", ctx.value().getText());
}
});
parser.document();
}
}
Input
new new, set, kill
set kill = new
Output
Call:
command = new
arg[0] = new
arg[1] = set
arg[2] = kill
Set:
command = set
name = kill
value = new
It's up to the calling code to determine whether a command is valid in a given context. The parser can't reasonably handle this because of MUMPS's loose approach to commands and identifiers. But it's not as bad as it may sound: you'll know which commands function like a call and which function like a set, so you'll be able to test the input from the Listener
that ANTLR produces. In the code above, for example, it would be very easy to test whether "set" was the command passed to exitSetExpr
.
Some MUMPS syntax may be more difficult to process than this, but the general approach will be the same: let the lexer treat commands and identifiers like ID
s, and use the parser rules to determine whether an ID
refers to a command or an identifier based on the context of the entire line.