Parsing stringa interpolazione ANTLR

https://stackoverflow.com/questions/1850468

13-09-2019
|

Domanda

Sto lavorando su un semplice DSL manipolazione delle stringhe per scopi interni, e vorrei che il linguaggio per supportare stringa di interpolazione in quanto viene utilizzato in Ruby.

Ad esempio:

name = "Bob"
msg = "Hello ${name}!"
print(msg)   # prints "Hello Bob!"

Sto cercando di implementare il mio parser in ANTLRv3, ma sono abbastanza inesperto con l'utilizzo di ANTLR quindi sono sicuri di come implementare questa funzione. Finora, ho specificato le mie stringhe letterali nel lexer, ma in questo caso io, ovviamente, bisogno di gestire il contenuto di interpolazione nel parser.

Il mio attuale grammatica letterale di stringa simile a questa:

STRINGLITERAL : '"' ( StringEscapeSeq | ~( '\\' | '"' | '\r' | '\n' ) )* '"' ;
fragment StringEscapeSeq : '\\' ( 't' | 'n' | 'r' | '"' | '\\' | '$' | ('0'..'9')) ;

Lo spostamento del movimentazione letterale di stringa nel parser sembra fare tutto il resto smettere di lavorare come dovrebbe. ricerche sul Web superficiale non ha dato alcuna informazione. Qualche suggerimento su come iniziare questo?

Soluzione

Non sono un esperto ANTLR, ma ecco una possibile grammatica:

grammar Str;

parse
    :    ((Space)* statement (Space)* ';')+ (Space)* EOF
    ;

statement
    :    print | assignment
    ;

print
    :    'print' '(' (Identifier | stringLiteral) ')' 
    ;

assignment
    :    Identifier (Space)* '=' (Space)* stringLiteral
    ;

stringLiteral
    :    '"' (Identifier | EscapeSequence | NormalChar | Space | Interpolation)* '"'
    ;

Interpolation
    :    '${' Identifier '}'
    ;

Identifier
    :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
    ;

EscapeSequence
    :    '\\' SpecialChar
    ;

SpecialChar
    :     '"' | '\\' | '$'
    ;

Space
    :    (' ' | '\t' | '\r' | '\n')
    ;

NormalChar
    :    ~SpecialChar
    ;

Come si nota, ci sono un paio di (Space)*-es all'interno l'esempio grammatica. Questo perché il stringLiteral è un parser-regola , invece di un lexer-regola . Perciò, quando la creazione di token il file di origine, il lexer non può sapere se uno spazio bianco è parte di una stringa letterale, o è solo uno spazio all'interno del file di origine che può essere ignorato.

Ho testato l'esempio con un po 'di classe Java e tutto ha funzionato come previsto:

/* the same grammar, but now with a bit of Java code in it */
grammar Str;

@parser::header {
    package antlrdemo;
    import java.util.HashMap;
}

@lexer::header {
    package antlrdemo;
}

@parser::members {
    HashMap<String, String> vars = new HashMap<String, String>();
}

parse
    :    ((Space)* statement (Space)* ';')+ (Space)* EOF
    ;

statement
    :    print | assignment
    ;

print
    :    'print' '(' 
         (    id=Identifier    {System.out.println("> "+vars.get($id.text));} 
         |    st=stringLiteral {System.out.println("> "+$st.value);}
         ) 
         ')' 
    ;

assignment
    :    id=Identifier (Space)* '=' (Space)* st=stringLiteral {vars.put($id.text, $st.value);}
    ;

stringLiteral returns [String value]
    :    '"'
        {StringBuilder b = new StringBuilder();} 
        (    id=Identifier           {b.append($id.text);}
        |    es=EscapeSequence       {b.append($es.text);}
        |    ch=(NormalChar | Space) {b.append($ch.text);}
        |    in=Interpolation        {b.append(vars.get($in.text.substring(2, $in.text.length()-1)));}
        )* 
        '"'
        {$value = b.toString();}
    ;

Interpolation
    :    '${' i=Identifier '}'
    ;

Identifier
    :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
    ;

EscapeSequence
    :    '\\' SpecialChar
    ;

SpecialChar
    :     '"' | '\\' | '$'
    ;

Space
    :    (' ' | '\t' | '\r' | '\n')
    ;

NormalChar
    :    ~SpecialChar
    ;

E una classe con un metodo main per testare il tutto:

package antlrdemo;

import org.antlr.runtime.*;

public class ANTLRDemo {
    public static void main(String[] args) throws RecognitionException {
        String source = "name = \"Bob\";        \n"+
                "msg = \"Hello ${name}\";       \n"+
                "print(msg);                    \n"+
                "print(\"Bye \\${for} now!\");    ";
        ANTLRStringStream in = new ANTLRStringStream(source);
        StrLexer lexer = new StrLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        StrParser parser = new StrParser(tokens);
        parser.parse();
    }
}

che produce il seguente output:

> Hello Bob
> Bye \${for} now!

Anche in questo caso, non sono un esperto, ma questo (almeno) ti consente di a modo per risolverlo.

HTH.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow