Question

Need to implement syntax highlighting for COS aka MUMPS
for the language of a possible design of the form

new (new,set,kill)
set kill=new

where: 'new' and 'set' are commands, and also variable

grammar cos;

Command_KILL            :( ('k'|'K') | ( ('k'|'K')('i'|'I')('l'|'L')('l'|'L') ) ); 
Command_NEW             :( ('n'|'N') | ( ('n'|'N')('e'|'E')('w'|'W') ) ); 
Command_SET             :( ('s'|'S') | ( ('s'|'S')('e'|'E')('t'|'T') ) );


INT : [0-9]+;
ID : [a-zA-Z][a-zA-Z0-9]*;
Space: ' ';
Equal: '=';

newCommand
    :   Command_NEW Space ID
    ;
setCommand
    :   Command_SET Space ID Space*  Equal Space* INT
    ; 

I have a problem, when ID like name as commands (NEW,SET e.t.c.)

Was it helpful?

Solution

According to the Wikipedia page, MUMPS doesn't have reserved words:

Reserved words: None. Since MUMPS interprets source code by context, there is no need for reserved words. You may use the names of language commands as variables.

Lexer rules like Command_KILL function exactly like reserved words: they're designed to make sure no other token is generated when input "kill" is encountered. So token type Command_KILL will always be produced on "kill", even if it's intended to be an identifier. You can keep the command lexer rules if you want, but you'll have to treat them like IDs as well because you just don't know what "kill" refers to based on the token alone.

Making a MUMPS implementation in ANTLR means focusing on token usage and context rather than token types. Consider this grammar:

grammar Example;


document    : (expr (EOL|EOF))+;
expr        : command=ID Space+ value (Space* COMMA Space* value)*  #CallExpr
            | command=ID Space+ name=ID Space* Equal Space* value   #SetExpr
            ;     

value       : ID | INT;

INT         : [0-9]+;
ID          : [a-zA-Z][a-zA-Z0-9]*;
Space       : ' ';
Equal       : '=';
EOL         : [\r\n]+;
COMMA       : ',';

Parser rule expr knows when an ID token is a command based on the layout of the entire line.

  • If the input tokens are ID ID, then the input is a CallExpr: the first ID is a command name and the second ID is a regular identifier.
  • If the input tokens are ID ID Equal ID, then the input is a SetExpr: the first ID will be a command (either "set" or something like it), the second ID is the target identifier, and the third ID is the source identifier.

Here's a Java test application followed by a test case similar to the one mentioned in your question.

import java.util.List;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;

public class ExampleTest {

    public static void main(String[] args) {

        ANTLRInputStream input = new ANTLRInputStream(
                "new new, set, kill\nset kill = new");

        ExampleLexer lexer = new ExampleLexer(input);

        ExampleParser parser = new ExampleParser(new CommonTokenStream(lexer));

        parser.addParseListener(new ExampleBaseListener() {
            @Override
            public void exitCallExpr(ExampleParser.CallExprContext ctx) {
                System.out.println("Call:");
                System.out.printf("\tcommand = %s%n", ctx.command.getText());
                List<ExampleParser.ValueContext> values = ctx.value();
                if (values != null) {
                    for (int i = 0, count = values.size(); i < count; ++i) {
                        ExampleParser.ValueContext value = values.get(i);
                        System.out.printf("\targ[%d]  = %s%n", i,
                                value.getText());
                    }
                }
            }

            @Override
            public void exitSetExpr(ExampleParser.SetExprContext ctx) {
                System.out.println("Set:");
                System.out.printf("\tcommand = %s%n", ctx.command.getText());
                System.out.printf("\tname    = %s%n", ctx.name.getText());
                System.out.printf("\tvalue   = %s%n", ctx.value().getText());
            }

        });

        parser.document();
    }
}

Input

new new, set, kill
set kill = new

Output

Call:
    command = new
    arg[0]  = new
    arg[1]  = set
    arg[2]  = kill
Set:
    command = set
    name    = kill
    value   = new

It's up to the calling code to determine whether a command is valid in a given context. The parser can't reasonably handle this because of MUMPS's loose approach to commands and identifiers. But it's not as bad as it may sound: you'll know which commands function like a call and which function like a set, so you'll be able to test the input from the Listener that ANTLR produces. In the code above, for example, it would be very easy to test whether "set" was the command passed to exitSetExpr.

Some MUMPS syntax may be more difficult to process than this, but the general approach will be the same: let the lexer treat commands and identifiers like IDs, and use the parser rules to determine whether an ID refers to a command or an identifier based on the context of the entire line.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top