Question

So, I'm making a Hack CPU emulator, and I was wondering what the best way to calculate the output was. Would condensing the output calculations into one unreadable line be more efficient than calculating the result one step at a time? Does the compiler optimize it such that both options are fine? Basically, which of these is more efficient --

this:

    word HackALU(word x, word y, bool zx, bool nx, bool zy, bool ny, bool f, bool no)
    {
        x = zx ? 0 : x;
        y = zy ? 0 : y;

        x = nx ? ~x : x;
        y = ny ? ~y : y;

        word result = f ? x + y : x & y;

        return no ? ~result : result;    
    }

or this:

    word HackALU(word x, word y, bool zx, bool nx, bool zy, bool ny, bool f, bool no)
    {
        return no ? ~(f ? ((nx ? ~(zx ? 0 : x) : (zx ? 0 : x)) + (ny ? ~(zy ? 0 : y) : (zy ? 0 : y))) : ((nx ? ~(zx ? 0 : x) : (zx ? 0 : x)) & (ny ? ~(zy ? 0 : y) : (zy ? 0 : y)))) : (f ? ((nx ? ~(zx ? 0 : x) : (zx ? 0 : x)) + (ny ? ~(zy ? 0 : y) : (zy ? 0 : y))) : ((nx ? ~(zx ? 0 : x) : (zx ? 0 : x)) & (ny ? ~(zy ? 0 : y) : (zy ? 0 : y))));
    }
Était-ce utile?

La solution

A good modern compiler will most likely generate identical code for both.

Autres conseils

Logic changes will have larger effects on the performance of code rather than whitespace / storage of temporaries will.

For example, some machines don't have branch prediction (PS3 SPUs for example), in which case your code will be definitively faster by replacing the branches with mathematical operations

word HackALU(word x, word y, bool zx, bool nx, bool zy, bool ny, bool f, bool no)
{
    x = (zx == 0) * x; // [0 or 1] * x;
    y = (zy == 0) * y;

    x -= (nx != 0) * 2 * x;
    y -= (ny != 0) * 2 * x;

    word result = (f != 0) * (x + y) + (f == 0) * (x & y);

    return (no != 0) * ~result + (no == 0) * result;    
}

Using this loop, I actually show the top version to be faster:

int n = 0; //optimization busting counter
clock_t start = clock();
    for( word x=0; x<1000; ++x ) {
    for( word y=0; y<1000; ++y ) {
        for( int b = 0; b < 64; ++b ) {
            n += HackALU(x,y,b&0x1,b&0x2,b&0x4,b&0x8,b&0x10,b&0x20);
}   }   }
clock_t end = clock();
printf("finished, elapsed ticks = %d, n = %d\n", end - start, n);

It's pretty obvious the top version would be less instructions unless the optimizer is really good... I think making it faster would require reducing branches or making sure they are accurately predicted.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top