Question

So the question is:

  • How to optimize function entry & exit code in a portable way for speed using GCC, plain C?

I am interested in relevant options etc. My goal is writing a CPU emulator where the instruction set is decoded using call tables. I already eliminated any function call I could reasonably eliminate, but due to the structure of the instruction set, doing 2-3 such calls per emulated instruction is necessary (so I can neither eliminate any more branch mispredictions here, either).

Based on analysing the assembly (x86, 32bits) output the option -fomit-frame-pointer seems worthwhile (once I don't care for the lost debug-ability here). Otherwise in general if I look over the complete emulator it seems like it could be better with better overall register and stack management (don't saving every single thing on every entry), my impression of the generated assembly is that it tampers more with the stack than how much useful work it does.

So the situation is basically that there is a whole lot of little functions which are called many-many times, and which can not be eliminated from the code.

I don't want to switch over from the interpreting emulation since this should be the most portable approach to do this thing (more portable anyway than any solution which would recompile).

Was it helpful?

Solution

On x86-32, the ABIs for common operating systems have standard calling conventions that use the stack to pass arguments to functions, because there are few general-purpose registers. One way to improve function calls that take only a few arguments (and relatively simple arguments) would be to use a different calling convention (like fastcall) that use registers to pass the arguments.

If moving to x86-64 is an option (and it should be, it's been around for ages...), the ABIs are much better for fast function calls, because the number of general-purpose registers doubled.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top