In the step in which you compile the code into an object file, you could also simply tell the GCC driver (gcc
) that you want to get assembly code:
gcc -S -c file.c
gcc -O2 -S -c file.c
gcc -S -masm=intel -c file.c
(the latter generates Intel instead of AT&T syntax assembly)
You can even then throw that assembly code at the assembler (e.g. gasm
) later on to get an object file which can be linked.
As to why the code is bigger, there is a number of reasons. The heuristics we humans used to hand-optimize assembly haven't been true anymore for quite some time. One big goal is pipelining, another vectorization. All in all it's about parallelizing as much as possible and having to invalidate as little as possible from the (already read) cache at any given time in order to speed up execution.
Even though it seems counter-intuitive, this can lead to bigger, yet faster, code.