gcc branch prediction
-
18-06-2021 - |
Question
Here's my demo program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int cmp(const void *d1, const void *d2)
{
int a, b;
a = *(int const *) d1;
b = *(int const *) d2;
if (a > b)
return 1;
else if (a == b)
return 0;
return -1;
}
int main()
{
int seed = time(NULL);
srandom(seed);
int i, n, max = 32768, a[max];
for (n=0; n < max; n++) {
int r = random() % 256;
a[n] = r;
}
qsort(a, max, sizeof(int), cmp);
clock_t beg = clock();
long long int sum = 0;
for (i=0; i < 20000; i++)
{
for (n=0; n < max; n++) {
if (a[n] >= 128)
sum += a[n];
}
}
clock_t end = clock();
double sec = (end - beg) / CLOCKS_PER_SEC;
printf("sec: %f\n", sec);
printf("sum: %lld\n", sum);
return 0;
}
unsorted
sec: 5.000000
sum: 63043880000
sorted
sec: 1.000000
sum: 62925420000
Here's an assembly diff of two versions of the program, one with qsort
and one without:
--- unsorted.s
+++ sorted.s
@@ -58,7 +58,7 @@
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
- leal 4(%esp), %eax
+ leal 16(%esp), %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
@@ -83,6 +83,13 @@
movl -16(%ebp), %eax
cmpl -24(%ebp), %eax
jl .L7
+ movl -24(%ebp), %eax
+ movl $cmp, 12(%esp)
+ movl $4, 8(%esp)
+ movl %eax, 4(%esp)
+ movl -32(%ebp), %eax
+ movl %eax, (%esp)
+ call qsort
movl $0, -48(%ebp)
movl $0, -44(%ebp)
movl $0, -12(%ebp)
As far as I understand the assembly output, the sorted version just has more code due to passing values to qsort
, but I don't see any branching optimization/prediction/whatever thing. Maybe I'm looking in the wrong direction?
Solution
Branch prediction is not something you will see at the assembly code level; it is done by the CPU itself.
OTHER TIPS
Built-in Function: long
__builtin_expect (long exp, long c)
You may use
__builtin_expect
to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs
), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.The return value is the value of
exp
, which should be an integral expression. The semantics of the built-in are that it is expected thatexp == c
. For example:if (__builtin_expect (x, 0)) foo ();
indicates that we do not expect to call
foo
, since we expectx
to be zero. Since you are limited to integral expressions forexp
, you should use constructions such asif (__builtin_expect (ptr != NULL, 1)) foo (*ptr);
when testing pointer or floating-point values.
Otherwise the branch prediction is determined by the processor...
Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types:
• Conditional branches.
• Direct calls and jumps.
• Indirect calls and jumps.
• Returns.