Overhead of a switch statement in C

https://stackoverflow.com/questions/927403

06-09-2019
|

Question

I'm a fairly competent Java programmer who's very new to C. I am trying to optimize a routine that has four modes of operation.

I loop over all the pixels in an image and compute a new pixel value depending on the 'mode' passed.

My question regards the overhead of a switch statement within two nested for loops. I'd be interested in any links to documentation regarding the relative efficiency of basic C statements, math and logical operations.

The code would go as follows;

for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             switch (mode)                  /* select the type of calculation */
             {
                case 0:
                weight = dCentre / maxDistanceEdge;
                case 1: 
                    weight = (float)x/width;             
                    break;
                case 2: 
                    weight = (float)y/height;
                    break;
                case 3: 
                    weight = dBottomLeft / maxDistanceCorner;
                    break;
                case 4: 
                    weight = dTopRight / maxDistanceCorner;
                    break;
                default: 
                weight = 1;
                break;
            }
             // Calculate the new pixel value given the weight
             ...
            }             

    }

Would you expect to see much overhead if this was over a 5000 x 5000 pixel image? I've tried to do some testing but my results are all over the place as the system (Mobile Device) has all sorts of stuff running in the background that may skew results.

The other option is to have a separate method for each mode, each with its own four loops. This would obviously introduce redundant code but efficiency is the name of the game here.

Thanks in advance!

Gav

Solution

Switch statements compile to a jump table for consecutive values and to a bunch of if-else statements for sparse values. In any case, you don't want a switch statement in your inner loop for image processing if you care about performance. You want to as below instead.

Also, note that I moved the weight calculation out of the inner loop (and swapped the loops for case 2 in order to achieve this). This type of thinking, moving stuff out of the inner loop, will get you the performance you want out of C.

switch (mode)                  /* select the type of calculation */
{
case 0:
    weight = dCentre / maxDistanceEdge;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 1:
    for (x = 0; x < width; x++) {
        weight = (float)x/width;
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 2:
    // note - the loops have been swapped to get the weight calc out of the inner loop
    for (y = 0; y < height; y++) {
        weight = (float)y/height;
        for (x = 0; x < width; x++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 3:
    weight = dBottomLeft / maxDistanceCorner;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 4:
    weight = dTopRight / maxDistanceCorner;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
default:
    weight = 1;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;

// etc..
}

OTHER TIPS

If efficiency is more important than code size, then yes you should create redundant routines. The case statement is one of the lower overhead things you can do in C, but it's not zero - it's going to have to branch based on the mode, and so it's going to take time. If you really want max performance, get the case out of the loop, even at the cost of duplicating the loop.

Switch statements are about as efficient as they can possibly be. They're compiled to a jump table. In fact, that's why switch is as limited as it is: you can only write a switch for which you can compile a jump tables based on a fixed value.

Compared with the maths you are doing in the loop, the overhead of the switch will probably be minimal. having said that, the only way to be sure is to create different versions for the two different approaches, and time them.

Switch/case is extremely fast compared to the equivalent with if/else: it is typically implemented as a jump table. However it still has a cost.

While you are optimizing things:

1) Try to loop over lines, not over columns (switch x and y "for" loops), one solution may be incredibly faster than the other, due to cache memory management.

2) Replacing all divisions by multiplications of the (pre-calculated) inverse will give you significant gain, and probably an acceptable precision loss.

For efficiency sake you better move switch outside the loop.

I'd use function pointers like this:

double fun0(void) { return dCentre/maxDistanceEdge; }
double fun1(void) { return (float)x/width; }
/* and so on ... */

double (*fun)(void);

switch (mode)                  /* select the type of calculation */
{
    case 0: fun = fun0;
            break;
    case 1: fun = fun1;
            break;
    case 2: fun = fun2;
            break;
    case 3: fun = fun3;
            break;
    case 4: fun = fun3;
            break;
    default : fun = fun_default;
            break;
}

for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             weight = fun();
             // Calculate the new pixel value given the weight
             ...
        }
}

It adds function call overhead but it shouldn't be too big as you pass no params to the function. I think it is good trade-off between performance and readability.

EDIT: If you use GCC, to get rid of function call you can use goto and labels as values: find the right label within the switch and then just jump to it every time. I think it should save few more cycles.

Switches shouldnt produce any significant overhead, they get compiled into a sort of array of pointers at the low end, then it's a case of effectively:

jmp {baseaddress}+switchcasenum

This would probably depend on how good your CPU's branch predictor is, and how your compiler generates the code for the switch. For such a small number of cases, it might generate a decision tree, in which case normal CPU branch prediction should be able to remove most of the overhead. Things might be a bit worse if it generates a switch table...

That said, the best way to find out is to profile it and see.

In addition to Jim's advice, try swapping the order of the loops. Whether loop-swapping is ideal for case 1 would require testing, but I suspect it is. You almost always want your x coordinate inside your inner loop in order to improve paging performance, as this causes your function to have a better tendency to stay in the same general memory area each iteration. And a mobile device with limitted resources might have low enough ram that this difference will be emphasized.

Sorry to bump this thread, but it seems to me that the switch is FAR from the problem.

The real problem with the efficiency in that case is the divisions. It seems to me that all of denominators of the division operations are constants (width, height, max...) and these will not change throughout the course of the image. If my guess is right, then these are simple variables that can change based on the image loaded so that any size image can be used at runtime, now this allows for any image size to be loaded, but this also means the compiler cannot optimize them into the much simpler multiplication operation which it could do if they were declared "const". My suggestion would be to pre-calculate the inverses of these constants and multiply. As far as I can remember, the multiplication operation takes about 10 clock cycles, where as the division takes around 70. That's an increase of 60 cycles per pixel, and with the above mentioned 5000x5000, that's an estimated speed increase of 1.5 seconds on a 1 GHz CPU.

Depends on the chip and the compiler and the details of the code, and...but this will often be implemented as a jump table, which should be pretty fast.

BTW-- Understanding this kind of thing is a pretty good argument for spending a couple of weeks learning some assembly at some point in your career...

Using a switch is probably better both for speed and programmer time. You're making less redundant code and it probably won't require a new stack frame.

Switches are so efficient that they can used for really weird and confusing black magic.

but efficiency is the name of the game here.

iterating over an image buffer in order to compute new pixel values sounds like a typical embarrassingly-parallel problem, in this sense you might want to consider pushing some of the work into worker threads, this should speed up your operation more notably than micro-optimizations such as switch/case concerns.

Also, instead of doing the branching instructions every time, you could invoke a function pointer from an array of function pointers, where the index serves as your mode identifier.

So that you end up with calls such as:

computeWeight[mode](pixel);

With 5000x5000 pixels, the function call overhead could also be reduced by calling the function for a range of pixels, rather than individual pixels.

You could also use loop unrolling and parameter passing by reference/pointer, in order to optimize this further.

Many good points are already given. Only thing I could think of to add to this, is to move the most frequent cases up in the switch and the least frequent down.

So if case 4 happens more often than case 1, it should be above it:

switch (mode) {
    case 4:
        // ..
        break;
    case 1:
        // ..
        break;
}

Too bad you weren't using c++, because then the switch statement could be replaced with polymorphism.

Cheers !

There are a lot of creative suggestions in this thread of ways not having to write 5 separate functions.

Unless you read 'mode' from a file or from typed input the calculation method can be determined at compile time. As a general rule you don't want to move calculations from compile time to run time.

Either way the code would be easier to read and no-one would be confused as to whether or not you meant to put in the break statement in the first case or not.

Also when you get bugs in the surrounding code you will not have to look up if the enum was set to the wrong value or not.

With regard to the inner loops... 0->var is better done var->0 as var-- triggers the zero flag (6502 days). This approach also means "width" is loaded into x and can be forgotten about, same goes for "height". Also pixels in memory are usually left->right, top->bottom so definitely have x as your inner loop.

for (y = height; y--;) {
    for (x = width; x--;) {
         weight = fun();
         // Calculate the new pixel value given the weight
         ...
    }
}

Also... and very important is your switch statements only have 2 cases that use x or y. The rest are constants.

 switch (mode)                  /* select the type of calculation */
 {
     case 0:
        weight = dCentre / maxDistanceEdge;
        break;
     //case 1: 
     //  weight = (float)x/width;             
     // break;
     //case 2: 
     //     weight = (float)y/height;
     //     break;
     case 3: 
          weight = dBottomLeft / maxDistanceCorner;
          break;
      case 4: 
           weight = dTopRight / maxDistanceCorner;
           break;
      default: 
           weight = 1;
           break;
 }

So basically unless mode 1 or 2 weight is calculated before the loop.

... Y loop code here

    if (mode == 2) { weight = (float)y/height; } // calc only once per Y loop

    ... X loop here

        if (mode == 1) { weight = (float)x/width; } // after this all cases have filled weight
        calc_pixel_using_weight(weight);

I have found switch statements to be very unkind if data is sparse. For < 4 elements I'd go for if-then-else and make sure the most common cases are up the top. If the first condition catches 90% of cases you've basically hit a home run. Likewise if some other condition is < 1% put it last.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow