Copied both versions, compiled each with gcc -S
to get the machine language output, used sdiff
to compare side-by-side.
Results using gcc version 4.1.2 20070115 (SUSE Linux):
No optimization:
main: main:
.LFB2: .LFB2:
pushq %rbp pushq %rbp
.LCFI0: .LCFI0:
movq %rsp, %rbp movq %rsp, %rbp
.LCFI1: .LCFI1:
subq $16, %rsp subq $16, %rsp
.LCFI2: .LCFI2:
movl $0, %eax movl $0, %eax
call get_a_value call get_a_value
movl %eax, -8(%rbp) | movl %eax, %edi
movl -8(%rbp), %edi <
movl $0, %eax movl $0, %eax
call calculate_something call calculate_something
movl %eax, -4(%rbp) movl %eax, -4(%rbp)
movl -4(%rbp), %eax movl -4(%rbp), %eax
leave leave
ret ret
Basically, one extra move instruction. Both allocate the same amount of stack space (subq $16, %rsp
reserves 16 bytes for the stack), so memory-wise there's no difference.
Level 1 optimization (-O1):
main: main:
.LFB2: .LFB2:
subq $8, %rsp subq $8, %rsp
.LCFI0: .LCFI0:
movl $0, %eax movl $0, %eax
call get_a_value call get_a_value
movl %eax, %edi movl %eax, %edi
movl $0, %eax movl $0, %eax
call calculate_something call calculate_something
addq $8, %rsp addq $8, %rsp
ret ret
No differences.
Results using gcc version 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2):
No optimization:
main: main:
pushl %ebp pushl %ebp
movl %esp, %ebp movl %esp, %ebp
subl $8, %esp subl $8, %esp
> subl $12, %esp
> subl $4, %esp
call get_a_value call get_a_value
> addl $4, %esp
movl %eax, %eax movl %eax, %eax
movl %eax, -4(%ebp) | pushl %eax
subl $12, %esp <
pushl -4(%ebp) <
call calculate_something call calculate_something
addl $16, %esp addl $16, %esp
movl %eax, %eax movl %eax, %eax
movl %eax, -8(%ebp) | movl %eax, -4(%ebp)
movl -8(%ebp), %eax | movl -4(%ebp), %eax
movl %eax, %eax movl %eax, %eax
leave leave
ret ret
Roughly the same number of instructions, ordered slightly differently.
Level 1 optimization (-O1):
main: main:
pushl %ebp pushl %ebp
movl %esp, %ebp movl %esp, %ebp
subl $8, %esp | subl $24, %esp
call get_a_value call get_a_value
subl $12, %esp | movl %eax, (%esp)
pushl %eax <
call calculate_something call calculate_something
leave leave
ret ret
Looks like the second version reserves a little more stack space.
So, for this particular example with these particular compilers, there's no huge difference between the two versions. In that case, I'd favor the first version for the following reasons:
- Easier to trace in a debugger; you can examine the value returned from
get_a_value
before passing it to calculate_something
;
- It gives you a place to do some sanity checking, in case
calculate_something
isn't well-behaved for certain inputs;
- It's a little easier on the eyes.
Just remember that terse doesn't necessarily mean fast or efficient, and what's fast/efficient under one particular compiler/hardware combination may be hopelessly busted under a different compiler/hardware combination. Some compilers actually have an easier time optimizing code that's written in a clear manner.
Your code should be, in order:
- Correct - it doesn't matter how fast it is or how little memory it uses if it doesn't meet its requirements;
- Secure - it doesn't matter how fast it is or how little memory it uses if it's a malware vector or risks exposing sensitive data to unauthorized parties (yes, I'm talking about Heart-frickin'-bleed);
- Robust - it doesn't mattter how fast it is or how little memory it uses if it dumps core because somebody sneezed in a different room;
- Maintainable - it doesn't matter how fast it is or how little memory it uses if it has to be scrapped and rewritten because the requirements changed (which they do);
- Efficient - now you can start worrying about performance and efficiency.