Question

In this and this stack overflow questions, the answers state that C as a compiler backend is a bad idea.

But why?

C has many compilers that can heavily optimize it. Every platform has a compiler that supports it, and it can be compiled to every architecture in existence. In addition, languages like Nim and V support generating C code.

So, I don't understand why C would be a bad idea at all. In my view, it seems a rather good choice.

Was it helpful?

Solution

Define "good" and "bad" backend

According to what criteria do you evaluate whether it is a good or bad solution? Without knowing, we are more in subjective beliefs rather than objective advise:

  • Your arguments make the C compiler an attractive alternative for developing a portable working solution very fast. Example: Bjarne Stroustrup invented C++ and started with a C Frontend that proved to be a good solution for around 10 years.
  • Going via C slows down the process: you take large code as input, write even larger code as output, let the C compiler process again this larger source, etc... To continue with the C++ example: once you discovered an ultra-fast Zortech C++ compiler, the slow CFront quickly appeared as not so good anymore.
  • Finally it also depends on the language proximity. Some constructs are not easily expressed in C and need a lot of boiler-plate code. It's like natural language translation: if you translate Japanese into German directly the result may very well be more accurate than translating from Japanese to English and English into German, because at every translation you risk to loose some precision.

So, no universal good or bad:

  • If you just need to write some report generator or a mathematical simulation engine, the drawback of the intermediary C is negligible in comparison with the benefits.
  • But if you're in more serious language design, you'd better go for a more robust solution. Fortunately, you no longer have to hand-craft assembler generation: between Java's bytecode, Python's bytecode, CIL and other virtual machines available, you can chose how to best reuse proven performant compilers and JIT compilers.

OTHER TIPS

Every platform has a compiler that supports it, and it can be compiled to every architecture in existence.

But they don't all behave the same. As a compiler writer, I don't want to have to depend on (or make) a configure script to figure out if my ints are actually 4 bytes or not. I don't want to chase down weird bugs on some esoteric platform because I depended on their C compiler's implementation of some undefined behavior. And I really don't want to inspect some dump of some C code that's been optimized by God-knows-what.

LLVM and other common backend targets are a little harder to work with when writing code, but they're tons easier to work with once the code is written because they're very unambiguous about how they behave and (generally) have dedicated tools for debugging the kind of catastrophes that compiler devs manage to create.

The reason C is not a good backend for a compiler is simple:

Any translation is imperfect.

Thus, unless your source-language can be mapped perfectly onto the target language, thus is a close analogue to C, the translation adds spurious constraints, is convoluted, or is wrong, if not all of them.

Target languages for compiler-writers are generally designed to allow fairly clean and efficient mapping of the source language(s), or are so low-level they are very easily translated to the target machine.

C has its own semantics and was created for writing in it directly, not for anything else, and it shows in that niche.

Licensed under: CC-BY-SA with attribution
scroll top