Question

I would like to see the disassembled code in the same order that the compiler generates after instruction rescheduling. b.t.w I am using GDB and when I give a command saying disas /m FunctionName it gives me disassembled code in the order of source code. I am trying to look at the effectiveness of instruction rescheduling by my compiler (GCC 4.1) and would like to see how instructions are rescheduled. Thanks! //////////////////EDITS//////////////////////////////////////// After looking at disassembled code for a line of code:

double w_n =  (A_n[2] * x[0] + A_n[5] * y + A_n[8] * z + A_n[11])  ;

I could see its 83 bytes of instructions. But after unrolling it by 2 iterations :

double w_n[2] = { (A_n[2] * x[0] + A_n[5] * y + A_n[8] * z + A_n[11]), (A_n_2[2] * x[0] + A_n_2[5] * y + A_n_2[8] * z + A_n_2[11]) };

The block of code is 226 bytes. And there is enormous increase in instruction count. Could anyone tell me why this is happening? I can also see from VTune that instructions retired after unrolling has increased. Possible reason I could think: Compiler is getting enough opportunity with increased block size to generate simple instructions so maximize the throughput of Instruction prefetch and decoder unit.

Any help is greatly appreciated. Thanks!!

Was it helpful?

Solution

If rescheduling has been done by the compiler, you really should see that when disassembling in gdb.

Otherwise you can perhaps use objdump directly on the command-line, that's my preferred way of seeing code in an ELF:

$ objdump --disassemble a.out | less

It doesn't reference the source at all, so it should really show what's in the binary itself.

OTHER TIPS

In the step in which you compile the code into an object file, you could also simply tell the GCC driver (gcc) that you want to get assembly code:

gcc -S -c file.c
gcc -O2 -S -c file.c
gcc -S -masm=intel -c file.c

(the latter generates Intel instead of AT&T syntax assembly)

You can even then throw that assembly code at the assembler (e.g. gasm) later on to get an object file which can be linked.


As to why the code is bigger, there is a number of reasons. The heuristics we humans used to hand-optimize assembly haven't been true anymore for quite some time. One big goal is pipelining, another vectorization. All in all it's about parallelizing as much as possible and having to invalidate as little as possible from the (already read) cache at any given time in order to speed up execution.

Even though it seems counter-intuitive, this can lead to bigger, yet faster, code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top