Question

I have a 64-bit Ubuntu 13.04 system. I was curious to see how 32-bit applications perform against 64-bit applications on a 64-bit system so I compiled the following C program as 32-bit and 64-bit executable and recorded the time they took to execute. I used gcc flags to compile for 3 different architectures:

  • -m32: Intel 80386 architecture (int, long, pointer all set to 32 bits (ILP32))
  • -m64: AMD's x86-64 architecture (int 32 bits; long, pointer 64 bits (LP64))
  • -mx32: AMD's x86-64 architecture (int, long, pointer all set to 32 bits (ILP32), but CPU in long mode with sixteen 64b registers, and register call ABI)
// this program solves the
// project euler problem 16.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <assert.h>
#include <sys/time.h>

int sumdigit(int a, int b);

int main(void) {
    int a = 2;
    int b = 10000;
    struct timeval start, finish;
    unsigned int i;
    gettimeofday(&start, NULL);
    for(i = 0; i < 1000; i++)
        (void)sumdigit(a, b);
    gettimeofday(&finish, NULL);
    printf("Did %u calls in %.4g seconds\n", 
            i, 
            finish.tv_sec - start.tv_sec + 1E-6 * (finish.tv_usec - start.tv_usec));
    return 0;
}

int sumdigit(int a, int b) {
    // numlen = number of digit in a^b
    // pcount = power of 'a' after ith iteration
    // dcount = number of digit in a^(pcount)

    int numlen = (int) (b * log10(a)) + 1;
    char *arr = calloc(numlen, sizeof *arr);
    int pcount = 0;
    int dcount = 1;
    arr[numlen - 1] = 1;
    int i, sum, carry;

    while(pcount < b) {
        pcount += 1;

        sum = 0; 
        carry = 0;

        for(i = numlen - 1; i >= numlen - dcount; --i) {
            sum = arr[i] * a + carry;
            carry = sum / 10;
            arr[i] = sum % 10;
        }

        while(carry > 0) {
            dcount += 1;
            sum = arr[numlen - dcount] + carry;
            carry = sum / 10;
            arr[numlen - dcount] = sum % 10;
        } 
    }

    int result = 0;
    for(i = numlen - dcount; i < numlen; ++i)
        result += arr[i];

    free(arr);
    return result;
}

The commands I used to get different executable:

gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_x32 -lm -mx32
gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_32 -lm -m32
gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_64 -lm

Here are the results I got:

ajay@ajay:c$ ./pe16_x32
Did 1000 calls in 89.19 seconds

ajay@ajay:c$ ./pe16_32
Did 1000 calls in 88.82 seconds

ajay@ajay:c$ ./pe16_64
Did 1000 calls in 92.05 seconds

Why does the 64-bit version runs slower than the 32-bit one? I read that the 64-bit architecture has improved instruction set and twice more general purpose registers compared to the 32-bit architecture which allows for more optimizations. When can I expect a better performance on a 64-bit system?

Edit I turned on the optimization using -O3 flag and now the results are:

ajay@ajay:c$ ./pe16_x32
Did 1000 calls in 38.07 seconds

ajay@ajay:c$ ./pe16_32
Did 1000 calls in 38.32 seconds

ajay@ajay:c$ ./pe16_64
Did 1000 calls in 38.27 seconds
Was it helpful?

Solution

Comparing performance of code without optimisations is rather pointless. If you care about performance, you'll only ever use optimised code.

And when you enable optimisations you find that the performance differences are negligible. That is to be expected. The operations you perform are all integer based operations, using data of the same size in all cases. Since the 32 bit and 64 bit code run on the same integer hardware units you should expect the same performance.

You are not using any floating point operations which is one area where there are sometimes differences between 32 and 64 bit code due to different floating point hardware units (x64 uses SSE, x86 may use x87).

In short, the results are exactly as expected.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top