runtime performance for binaries compiled on a different OS than target OS

https://softwareengineering.stackexchange.com/questions/408504

compiler

09-03-2021
|

Question

Given 2 programs that are exactly the same where one is compiled directly on the host machine and the other is compiled using cross compilation (eg: say macOS to Linux). Can there be a difference in terms of runtime performance between the 2 set of binaries generated?

Intuitively I'm thinking that when compiling on the host machine directly the compiler can use more accurate informations (exact OS version, type of hardware...) and use them to perform some optimizations.

If there exists differences, then I'm wondering how important can those be.

Solution

The thing is, even if you compile on the same operating system as a program runs, the compiler has no way of knowing it is the same machine. The usual case is to compile on a different machine than the end user runs it on. That means any machine-specific optimizations need to be specified as a compiler flag. Those flags can be specified just as well when cross-compiling, assuming you are using the same compiler.

So yes, knowing the exact specifications of the runtime hardware can enable you to specify somewhat better optimizations, but whether you are cross-compiling or not makes no difference in being able to use those optimizations.

OTHER TIPS

No, the compiler does the exact same thing. Compilers don't usually exploit system-specific information, in order to keep the resulting executables runnable on other versions of the same CPU architecture. Normally, a program compiled on an Intel CPU will run just fine on an AMD CPU. Of course, compilers make performance assumptions when optimizing, e.g. how long a particular instruction will take. But those assumptions won't generally depend on the current system, but rather on which CPUs were common when the compiler was released.

There are two huge limitations to this “it doesn't matter” claim.

Linking is difficult in a cross-compilation setting. When you just statically link your libraries, things are going to be fine. But dynamic linking can end up being very tricky, as the compilation system may have different libraries than the execution system. Parts of the libraries might be compiled into the program due to header files, even when doing dynamic linking.
You can opt-in to microarchitecture-specific assumptions e.g. with -march=native on GCC/Clang. That is entirely non-portable. Instead of optimizing for the native (i.e. current) architecture you can also provide a specific target, e.g. -march=skylake-avx512 or -march=znver2. Whereas -march primarily affects which CPU instruction set extension are available, -mtune merely optimizes for that architecture without preventing it from running on microarchitectures without those features.

In principle, a compiler can also produce multiple code paths where one code path uses instruction set extensions and a fallback path simulates those using more widely supported instructions.

Microarchitecture-specific optimizations is one area where JIT compilers have a theoretical advantage over ahead-of-time compilers, because the JIT compiler's output never leaves the machine and doesn't have to be compatible. A similar technique is just compiling the software for every possible target, which is e.g. feasible in an app store setting.

The performance solely depends on what machine instructions the compiler generates out of your code.

And the same compiler, called with the same flags, should always generate the same machine code for the same target architecture, regardless on which platform you are executing it. The generated machine code does in no way depend on the platform the compiler is running on, it only depends on the target architecture and the compiler flags.

The only difference here is when cross compiling, the only knowledge the compiler will have about your target platform is the knowledge provided by arguments. If you run it directly on the target platform, you can leave out certain arguments and the compiler will use auto detection to fill these gaps. But if you compile code not for yourself and plan to distribute it, you should never rely on auto detection and always specify all details, otherwise the result may run on your system but not on other systems you plan to distribute to.

I would generally lump the above suggestions into ... "it just isn't really worth worrying about." There are just too many variables in play for the outcome to be categorically useful ...

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange