Question

I am just in the beginning of my graduation project that is supposed to last for 6 months. The goal of the project is to implement a .Net-compiler for one scripting language. I had the Compiler Construction as a subject in my curriculum and am aware of the basic steps how to implement a compiler in general, but we used Bison and simple compiler with GCC as back-end and thus I don't know much about implementing compilers on .Net platform.

Having carried out some research on this topic I found the following alternative solutions for code generation (I am not talking about other essential parts of compiler, like a parser -- it is out of scope here):

  1. Direct code generation using Reflection.Emit.
  2. Using Common Compiler Interface abstraction over Reflection.Emit for automation of some code generation.
  3. Using CodeDOM for C# and VB compilation at runtime.
  4. There is a new emerging C# "compiler as a service" called Roslyn, available as a CTP now.
  5. DLR offers support for dynamic code generation and has some interfaces for runtime code generation via expression trees etc.
  6. Mono is shipped with Mono.Cecil library that seems to have some functionality for code generation as well.

The primary goal of my project is to delve deeper into the guts of .Net, to learn Compiler Construction and to get good grade for my work. The secondary goal is to come up with a compiler implementation that can be later opened to the community under a permissive open-source license.

So, what would be a most interesting, educative, entertaining and promising approach here? I would have definitely tried all of them if I had some more time, but I need to submit my work in 6 months sharp to get a positive grade...

Thank you in advance, Alexander.

Was it helpful?

Solution

If you want the easier way and your language can be reasonably translated into C#, I would recommend you to generate C# code (or similar) and compile that. Roslyn would be probably best at that. Apparently, CCI can do that too using CCI Code, but I've never used that. I wouldn't recommend CodeDOM, because it doesn't support features like static classes or extension methods.

If you want more control or if you want to go low-level you can generate CIL directly using Reflection.Emit. But it will be (much) more work, especially if you're not familiar with CIL. I think Cecil can be used the same way, but it's intended for something else, and I don't think it offers any advantages over Reflection.Emit.

DLR is meant, as its full name suggests, for dynamic languages. The Expressions it uses can be used for code generation, but I think they are best at generating relatively simple methods at runtime. Of course, DLR itself can be very useful if your language is dynamic.

OTHER TIPS

Boo is a language/compiler that targets the CLI. It appears to be open source so you could study how they accomplish it.

Back when I was writing compilers, I would write to assembly language (i.e. assembly language source code) that I then ran through the system's assembler. That way I could easily see what I was generating. It's a whole lot easier to read mov ax, bx (x86 assembly) than it is to decode HEX opcodes.

If I wasn't allowed to use the assembler in the final product, I developed the compiler using the assembly output and then once I got everything working I made a binary output path. The beauty was, all I had to change was the actual bytes output (opcodes and binary values rather than text).

I would suggest doing something similar for your project. Develop it initially to output MSIL that you can assemble with ILASM. That way, you can easily verify your code generator's output by reading the generated code. Once you're confident that your code generator is working, add an output option that will use Reflection.Emit or the Common Compiler Infrastructure.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top