Frage

For writing a compiler, what are the advantages and disadvantages of using LLVM IR vs C for a target language? I know both are used, and I imagine that the final machine code would be similar if I were to use clang to compile the C. So what are other things to consider?

War es hilfreich?

Lösung

I've used LLVM IR for a few compiler back ends and have worked with compilers that use C as a back end. One thing that I found that gave the LLVM IR an advantage is that it is typed. It is hard to make completely ill-formed output without getting errors from the LLVM libraries.

It is also easier to keep a close correlation between the source code and the IR for debugging, in my opinion.

Plus, you get all the cool LLVM command line tools to analyse and process the IR your front end emits.

Andere Tipps

LLVM advantages:

  1. JIT - you can compile and run your code dynamically. Sure the same is possible with C (e.g., using an embedded tcc), but it is a much less robust and portable option.
  2. You can run your own optimisation passes over the generated IR.
  3. Reflection for free - inspecting the generated code is much easier with LLVM.
  4. LLVM library is not as big as most of the C compilers (not counting tcc, of course).

LLVM drawbacks:

  1. Code is not portable, you have to change it slightly depending on your target. There is a somewhat portable subset of LLVM, but it is still a dodgy practice.
  2. Runtime dependency on the C++ libraries, might be a bit of an issue.

I doubt you can implement proper debugging support for your language when targeting C.

Architectures and OSes for which there is no CLANG obviously, or for which it is in an experimental state.

C is more widely accepted, but LLVM IR allows you to spoon feed the LLVM engine. Not all paths to IR are equal.

I will use LLVM to refer to the framework, and LLVM IR to refer to the target language.

C Advantages

  1. Cross-platform
  2. Debugging (Please read below. It is partly related to point 4.)
  3. Interoperability
  4. Ease of use

LLVM IR Advantages

  1. Performance
  2. Customization options
  3. Memory footprint
  4. Strong typing/Saftey

C

  1. There exist C-compilers for all sorts of embedded systems even though LLVM has gotten more targets as of late. It can be argued that C has a slight advantage over the LLVM IR (Intermediate representation) in this category.

  2. The main advantage of targetting C instead of LLVM is that the generated code is on a higher level compared to LLVM. Using standardized debuggers such as the GDB, it can be argued that it is easier to reason about the behavior of the generated code. It is also easier to make use of a debugger such as GDB to construct a debugger for the language compiled to C.

  3. The third point. Interoperability is fussier. However, C has a standardized application binary interface. It is thus easier to write libraries and interface these libraries with other programs written in C and or C+. Still, many languages, such as Java, provide standardized interfaces to C.

  4. It can be argued that it is easier to get started and get something working by targetting C

LLVM

  1. C is a quite high-level language, and if it is not written, idiomatically, performance may degrade (Depending on the target compiler, and what assumptions said compiler makes). There are some papers such as An llVM backend for GHC which illustrates some disadvantages of C and advantages of LLVM IR as a target language.

  2. Since LLVM (The framework) is built as a collection of reusable units, it is easy to write target language-specific passes for your specific target language. It is also easier to write a custom GC (There is as of 2020 some support for this). In the case of C it is also possible, and there are some garbage collectors such as Boehm GC. However, C is not designed as an intermediate language.

  3. Memory footprint. Generated C code has a larger memory footprint compared to LLVM bitcode. If you are compiling and linking a big system, you are likely to get compilation time advantages targetting LLVM.

  4. While C is weakly typed language. LLVM IR is a strongly typed one. It can, therefore, be argued that it is safer to target LLVM IR.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top