Question

I'm designing a DSL which translates to Java source code. Are their notations which are commonly used to specify the semantics/translation of a compiler?

Example:

DSL:

a = b = c = 4

Translates into:

Integer temp0 = 4;
Integer a = temp0;
Integer b = temp0;
Integer c = temp0;

Thanks in advance,

Jeroen

No correct solution

OTHER TIPS

Pattern matching languages can be used to formalise small tree transforms. For an example of such a DSL take a look at Nanopass framework. A more generic approach is to think of the tree transforms as a form of term rewriting.

Such transforms are formal enough, e.g., they can be certified, as in CompCert.

There are formal languages to define semantics; you can see such languages and definitions in almost any technical paper in conference proceedings on programming languages. Texts on the topic are available: https://mitpress.mit.edu/.../semantics-programming-languages You need to have some willingness to read concise mathematical notations.

As a practical matter, these semantics are not used to drive translations/compilers; this is still a research topic. See http://Fwww.andrew.cmu.edu%2Fuser%2Fasubrama%2Fdissertation.pdf To read these you typically need to have spent some time reading introductory texts such as the above.

There has been more practical work on defining translations; the most practical are program transformation systems. With such tools, one can write, using the notations of source language (e.g., your DSL), and the notation of the target language (e.g., Java or assembler or whatever), transformation rules of the form:

 replace source_language_fragment by target_language_fragment if condition

These tools are driven by grammar for the source and target languages, and interpret the transformation rules from their readable form into AST to AST rewrites. To fully translate a complex DSL to another language typically requires hundreds of rules, but a key point is they are much more easily read than procedural code typical of hand-written translators.

Trying to follow OP's example, assuming one has grammars for the OP's "MyDSL" and for "Java" as a target, and using our DMS Software Reengineeering Toolkit's style of transformation rules:

  source domain dsl;
  target domain Java;

  rule translate_single_assignment(t: dsl_IDENTIFIER, e: dsl_expression):
     " \t = \e "  -- MyDSL syntax
     ->           -- read as "rewrites to"
     " int \JavaIdentifier\(\t\)=\e;
     ".

 rule translate_multi_assignment(t1: dsl_IDENTIFIER, t2: dsl_IDENTIFIER, e: dsl_expression):
     " \t1 = \t2 = \e "  -- MyDSL syntax
     ->           -- read as "rewrites to"
     " \>\dsl \t2 = \e \statement
       int \t1;
       \t1=\t2;
     ".

You need two rules: one for the base case of a simple assignment t=e; and one to handle the multiple assignment case. The multiple assignment case peels off the outermost assignment, and generates code for it, and inserts the remainder of the multiple assignment back in in its original DSL form, to be reprocessed by one of the two rules.

You can see another example of this used for refactoring (source_language == target_language) at https://stackoverflow.com/questions/22094428/programmatic-refactoring-of-java-source-files/22100670#22100670

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top