Code substitution for DSL using ANTLR

Question 1

So I have found one solution to the issue. I think it could be better - as it potentially does a lot of array copying - but at least it works for now.

EDIT: I was wrong before, and my solution would consume ANY & that it found, including those in valid locations such as inside string constants. This seems like a better solution:

First, I extended the InputStream so that it is able to rewrite the input steam when a & is encountered. This unfortunately involves copying the array, which I can maybe resolve in the future:

MacroInputStream.java

    package preprocessor;

    import org.antlr.v4.runtime.ANTLRInputStream;

    public class MacroInputStream extends ANTLRInputStream {

      private HashMap<String, String> map;

      public MacroInputStream(String s, HashMap<String, String> map) {
        super(s);
        this.map = map;
      }

      public void rewrite(int startIndex, int stopIndex, String replaceText) {
        int length = stopIndex-startIndex+1;
        char[] replData = replaceText.toCharArray();
        if (replData.length == length) {
          for (int i = 0; i < length; i++) data[startIndex+i] = replData[i];
        } else {
          char[] newData = new char[data.length+replData.length-length];
          System.arraycopy(data, 0, newData, 0, startIndex);
          System.arraycopy(replData, 0, newData, startIndex, replData.length);
          System.arraycopy(data, stopIndex+1, newData, startIndex+replData.length, data.length-(stopIndex+1));
          data = newData;
          n = data.length;
        }
      }
    }

Secondly, I extended the Lexer so that when a VARIABLE token is encountered, the rewrite method above is called:

MacroGrammarLexer.java

package language;

import language.DSL_GrammarLexer;

import org.antlr.v4.runtime.Token;

import java.util.HashMap;

public class MacroGrammarLexer extends MacroGrammarLexer{

  private HashMap<String, String> map;

  public DSL_GrammarLexerPre(MacroInputStream input, HashMap<String, String> map) {
    super(input);
    this.map = map;
    // TODO Auto-generated constructor stub
  }

  private MacroInputStream getInput() {
    return (MacroInputStream) _input;
  }

  @Override
  public Token nextToken() {
    Token t = super.nextToken();
    if (t.getType() == VARIABLE) {
      System.out.println("Encountered token " + t.getText()+" ===> rewriting!!!");
      getInput().rewrite(t.getStartIndex(), t.getStopIndex(),
          map.get(t.getText().substring(1)));
      getInput().seek(t.getStartIndex()); // reset input stream to previous
      return super.nextToken();
    }
    return t;   
  }   

}

Lastly, I modified the generated parser to set the variables at the time of parsing:

DSL_GrammarParser.java

    ...
    ...
    HashMap<String, String> map;  // same map as before, passed as a new argument.
    ...
    ...

public final SetContext set() throws RecognitionException {
  SetContext _localctx = new SetContext(_ctx, getState());
    enterRule(_localctx, 130, RULE_set);
    try {
        enterOuterAlt(_localctx, 1);
        {
        String vname = null; String vval = null;              // set up variables
        setState(1215); match(SET);
        setState(1216); vname = variable_name().getText();    // set vname
        setState(1217); match(EQUALS);
        setState(1218); vval = string_constant().getText();   // set vval
        System.out.println("Found SET " + vname +" = " + vval+";");
            map.put(vname, vval);
        }
    }
    catch (RecognitionException re) {
        _localctx.exception = re;
        _errHandler.reportError(this, re);
        _errHandler.recover(this, re);
    }
    finally {
        exitRule();
    }
    return _localctx;
}
    ...
    ...

Unfortunately this method is final so this will make maintenance a bit more difficult, but it works for now.

Question 2

The standard pattern to handling your requirements is to implement a symbol table. The simplest form is as a key:value store. In your visitor, add var declarations as encountered, and read out the values as var references are encountered.

As described, your DSL does not define a scoping requirement on the variables declared. If you do require scoped variables, then use a stack of key:value stores, pushing and popping on scope entry and exit.

See this related StackOverflow answer.

Separately, since your strings may contain commands, you can simply parse the contents as part of your initial parse. That is, expand your grammar with a rule that includes the full set of valid contents:

set_variable
   : 'SET' ID '=' stringLiteral ';'
   ;

stringLiteral: 
   Quote Quote? ( 
     (    set_variable
        | print_line
        | VARIABLE
        | ID
     )
     | STRING_CONSTANT  // redefine without the quotes
   )
   Quote
   ;