Question

Suppose you are designing, and writing a compiler for, a new language called Foo, among whose virtues is intended to be that it's particularly good for implementing compilers. A classic approach is to write the first version of the compiler in C, and use that to write the second version in Foo, after which it becomes self-compiling.

This does mean you have to be careful to keep backup copies of the binary (as opposed to most programs where you only have to keep backup copies of the source); once the language has evolved away from the first version, if you lost all copies of the binary, you would have nothing capable of compiling the current version. So be it.

But suppose it is intended to support both Linux and Windows. As long as it is in fact running on both platforms, it can compile itself on each platform, no problem. Supposing however you lost the binary on one platform (or had reason to suspect it had been compromised by an attacker); now there is a problem. And having to safeguard the binary for every supported platform is at least one more failure point than I'm comfortable with.

One solution would be to make it a cross-compiler, such that the binary on either platform can target both platforms.

This is not quite as easy as it sounds - while there is no problem selecting the binary output format, each platform provides the system API in the form of C header files, which normally only exist on their native platform, e.g. there is no guarantee code compiled against the Windows stdio.h will work on Linux even if compiled into Linux binary format.

Perhaps that problem could be solved by downloading the Linux header files onto a Windows box and using the Windows binary to cross-compile a Linux binary.

Are there any caveats with that solution I'm missing?

Another solution might be to maintain a separate minimum bootstrap compiler in Python, that compiles Foo into portable C, accepting only that subset of the language needed by the main Foo compiler and performing minimum error checking and no optimization, the intent being that the bootstrap compiler will thus remain simple enough that maintaining it across subsequent language versions wouldn't cost very much.

Again, are there any caveats with that solution I'm missing?

What methods have people used to solve this problem in the past?

Was it helpful?

Solution

This is a problem for C compilers themselves. It's typically solved by the use of a cross-compiler, exactly as you suggest.

The process of cross-compiling a compiler is no more difficult than cross-compiling any other project: that is to say, it's trickier than you'd like, but by no means impossible.

Of course, you first need the cross-compiler itself. This probably means some major surgery to your build-configuration system, and you'll need some kind of "sysroot" taken from the target (header, libraries, anything else you'll need to reference in a build).

So, in the end it depends on how your compiler is structured. Either it's easier to re-bootstrap using historical sources, repeating each phase of language compatibility you went through in the first place (you did use source revision control, right?), or it's easier to implement a cross-compiler configuration. I can't tell you which from here.

For many years, the GCC compiler was always written only in standard-compliant C code for exactly this reason: they wanted to be able to bring it up on any OS, given only the native C compiler for that system. Only in 2012 was it decided that C++ is now sufficiently widespread that the compiler itself can be written in it. Even then, they're only permitting themselves a subset of the language. In future, if anybody wants to port GCC to a platform that does not already have C++, they will need to either use a cross-compiler, or first port GCC 4.7 (that last major C-only version) and then move to the latest.

Additionally, the GCC build process does not "trust" the compiler it was built with. When you type "make", it first builds a reduced version of itself, it then uses that the build a full version. Finally, it uses the full version to rebuild another full version, and compares the two binaries. If the two do not match it knows that the original compiler was buggy and introduced some bad code, and the build has failed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top