Question

I am writing a Linux Kernel driver (for ARM) and in an irq handler I need to check the interrupt bits.

bit
 0/16  End point 0 In/Out interrupt
       (very likely, while In is more likely)
 1/17  End point 1 In/Out interrupt
 ...
15/31  End point 15 In/Out interrupt

Note that more than a bit can be set at a time.

So this is the code:

int i;
u32 intr = read_interrupt_register();

/* ep0 IN */
if(likely(intr & (1 << 0))){
    handle_ep0_in();
}

/* ep0 OUT */
if(likely(intr & (1 << 16))){
    handle_ep0_out();
}

for(i=1;i<16;++i){
    if(unlikely(intr & (1 << i))){
        handle_ep_in(i);
    }
    if(unlikely(intr & (1 << (i + 16)))){
        handle_ep_out(i);
    }
}

(1 << 0) and (1 << 16) would be calculated in compile time, however (1 << i) and (1 << (i + 16)) wouldn't. Also there would be integral comparison and addition in the loop.

Because it is an irq handler, work should be done within the shortest time. This let me think whether I need to optimize it a bit.

Possible ways?

1. Split the loop, seems to make no difference...

/* ep0 IN */
if(likely(intr & (1 << 0))){
    handle_ep0_in();
}

/* ep0 OUT */
if(likely(intr & (1 << 16))){
    handle_ep0_out();
}

for(i=1;i<16;++i){
    if(unlikely(intr & (1 << i))){
        handle_ep_in(i);
    }
}
for(i=17;i<32;++i){
    if(unlikely(intr & (1 << i))){
        handle_ep_out(i - 16);
    }
}

2. Shift intr instead of the value to be compared to?

/* ep0 IN */
if(likely(intr & (1 << 0))){
    handle_ep0_in();
}

/* ep0 OUT */
if(likely(intr & (1 << 16))){
    handle_ep0_out();
}

for(i=1;i<16;++i){
    intr >>= 1;
    if(unlikely(intr & 1)){
        handle_ep_in(i);
    }
}
intr >>= 1;
for(i=1;i<16;++i){
    intr >>= 1;
    if(unlikely(intr & 1)){
        handle_ep_out(i);
    }
}

3. Fully unroll the loop (not shown). That would make the code a bit messy.

4. Any other better ways?

5. Or it's that the compiler will actually generate the most optimized way?


Edit: I was looking for a way to tell the gcc compiler to unroll that particular loop, but it seems that it isn't possible according to my search...

Was it helpful?

Solution

If we can assume that the number of set bits in intr is low (as it is usually the case in interrupt masks) we can optimize a little bit and write a loop that executes for each bit only once:

void handle (int intr)
{
  while (intr)
  {
    // find index of lowest bit set in intr:
    int bit_id = __builtin_ffs(intr)-1;

    // call handler:
    if (bit_id > 16)
      handle_ep_out (bit_id-16);
    else
      handle_ep_in (bit_id);

    // clear that bit
    // (I think there was a bit-hack out there to simplify this step even further)
    intr -= (1<<bit_id);
  }
}

On most ARM architectures __builtin_ffs will compile down to a CLZ instruction and some arithmetic around it. It should do so for anything but ARM7 and older cores.

Also: When writing interrupt handlers on embedded devices the size of the function makes a difference for performance as well because the instructions have to be loaded into the code-cache. Lean code usually executes faster. A bit overhead is okay if you save memory accesses to memory that is unlikely to be in the cache.

OTHER TIPS

I would probably go for option 5 myself. Code for readability and let gcc's insane optimisation level -O3 do what it can.

I've seen code generated at that level that I can't even understand.

Any hand-crafted optimisation in C (other than possibly unrolling and using constants rather than runtime bit shifts, a la option 3) is unlikely to outperform what the compiler itself can do.

I think you'll find that the unrolling may not be as messy as you think:

if (  likely(intr & 0x00000001)) handle_ep0_in();
if (  likely(intr & 0x00010000)) handle_ep0_out();

if (unlikely(intr & 0x00000002)) handle_ep_in(1);
if (unlikely(intr & 0x00020000)) handle_ep_out(1);

:

if (unlikely(intr & 0x00008000)) handle_ep_in(15);
if (unlikely(intr & 0x80000000)) handle_ep_out(15);

In fact, you can make it a lot less messier with macros (untested, but you should get the general idea):

// Since mask is a constant, "mask << 32" should be too.

# define chkintr (mask, num) \
    if (unlikely(intr & (mask      ))) handle_ep_in  (num); \
    if (unlikely(intr & (mask << 32))) handle_ep_out (num);

// Special case for high probability bit.

if (likely(intr & 0x00000001UL)) handle_ep0_in();
if (likely(intr & 0x00010000UL)) handle_ep0_out();

chkintr (0x0002UL,  1);  chkintr (0x0004UL,  2);  chkintr (0x0008UL,  3);
chkintr (0x0010UL,  4);  chkintr (0x0020UL,  5);  chkintr (0x0040UL,  6);
chkintr (0x0080UL,  7);  chkintr (0x0100UL,  8);  chkintr (0x0200UL,  9);
chkintr (0x0400UL, 10);  chkintr (0x0800UL, 11);  chkintr (0x1000UL, 12);
chkintr (0x2000UL, 13);  chkintr (0x4000UL, 14);  chkintr (0x8000UL, 15);

The only step up from there is hand-coding assembly language and there's still the good possibility that gcc may be able to outperform you :-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top