Domanda

I am writing a Linux Kernel driver (for ARM) and in an irq handler I need to check the interrupt bits.

bit
 0/16  End point 0 In/Out interrupt
       (very likely, while In is more likely)
 1/17  End point 1 In/Out interrupt
 ...
15/31  End point 15 In/Out interrupt

Note that more than a bit can be set at a time.

So this is the code:

int i;
u32 intr = read_interrupt_register();

/* ep0 IN */
if(likely(intr & (1 << 0))){
    handle_ep0_in();
}

/* ep0 OUT */
if(likely(intr & (1 << 16))){
    handle_ep0_out();
}

for(i=1;i<16;++i){
    if(unlikely(intr & (1 << i))){
        handle_ep_in(i);
    }
    if(unlikely(intr & (1 << (i + 16)))){
        handle_ep_out(i);
    }
}

(1 << 0) and (1 << 16) would be calculated in compile time, however (1 << i) and (1 << (i + 16)) wouldn't. Also there would be integral comparison and addition in the loop.

Because it is an irq handler, work should be done within the shortest time. This let me think whether I need to optimize it a bit.

Possible ways?

1. Split the loop, seems to make no difference...

/* ep0 IN */
if(likely(intr & (1 << 0))){
    handle_ep0_in();
}

/* ep0 OUT */
if(likely(intr & (1 << 16))){
    handle_ep0_out();
}

for(i=1;i<16;++i){
    if(unlikely(intr & (1 << i))){
        handle_ep_in(i);
    }
}
for(i=17;i<32;++i){
    if(unlikely(intr & (1 << i))){
        handle_ep_out(i - 16);
    }
}

2. Shift intr instead of the value to be compared to?

/* ep0 IN */
if(likely(intr & (1 << 0))){
    handle_ep0_in();
}

/* ep0 OUT */
if(likely(intr & (1 << 16))){
    handle_ep0_out();
}

for(i=1;i<16;++i){
    intr >>= 1;
    if(unlikely(intr & 1)){
        handle_ep_in(i);
    }
}
intr >>= 1;
for(i=1;i<16;++i){
    intr >>= 1;
    if(unlikely(intr & 1)){
        handle_ep_out(i);
    }
}

3. Fully unroll the loop (not shown). That would make the code a bit messy.

4. Any other better ways?

5. Or it's that the compiler will actually generate the most optimized way?


Edit: I was looking for a way to tell the gcc compiler to unroll that particular loop, but it seems that it isn't possible according to my search...

È stato utile?

Soluzione

If we can assume that the number of set bits in intr is low (as it is usually the case in interrupt masks) we can optimize a little bit and write a loop that executes for each bit only once:

void handle (int intr)
{
  while (intr)
  {
    // find index of lowest bit set in intr:
    int bit_id = __builtin_ffs(intr)-1;

    // call handler:
    if (bit_id > 16)
      handle_ep_out (bit_id-16);
    else
      handle_ep_in (bit_id);

    // clear that bit
    // (I think there was a bit-hack out there to simplify this step even further)
    intr -= (1<<bit_id);
  }
}

On most ARM architectures __builtin_ffs will compile down to a CLZ instruction and some arithmetic around it. It should do so for anything but ARM7 and older cores.

Also: When writing interrupt handlers on embedded devices the size of the function makes a difference for performance as well because the instructions have to be loaded into the code-cache. Lean code usually executes faster. A bit overhead is okay if you save memory accesses to memory that is unlikely to be in the cache.

Altri suggerimenti

I would probably go for option 5 myself. Code for readability and let gcc's insane optimisation level -O3 do what it can.

I've seen code generated at that level that I can't even understand.

Any hand-crafted optimisation in C (other than possibly unrolling and using constants rather than runtime bit shifts, a la option 3) is unlikely to outperform what the compiler itself can do.

I think you'll find that the unrolling may not be as messy as you think:

if (  likely(intr & 0x00000001)) handle_ep0_in();
if (  likely(intr & 0x00010000)) handle_ep0_out();

if (unlikely(intr & 0x00000002)) handle_ep_in(1);
if (unlikely(intr & 0x00020000)) handle_ep_out(1);

:

if (unlikely(intr & 0x00008000)) handle_ep_in(15);
if (unlikely(intr & 0x80000000)) handle_ep_out(15);

In fact, you can make it a lot less messier with macros (untested, but you should get the general idea):

// Since mask is a constant, "mask << 32" should be too.

# define chkintr (mask, num) \
    if (unlikely(intr & (mask      ))) handle_ep_in  (num); \
    if (unlikely(intr & (mask << 32))) handle_ep_out (num);

// Special case for high probability bit.

if (likely(intr & 0x00000001UL)) handle_ep0_in();
if (likely(intr & 0x00010000UL)) handle_ep0_out();

chkintr (0x0002UL,  1);  chkintr (0x0004UL,  2);  chkintr (0x0008UL,  3);
chkintr (0x0010UL,  4);  chkintr (0x0020UL,  5);  chkintr (0x0040UL,  6);
chkintr (0x0080UL,  7);  chkintr (0x0100UL,  8);  chkintr (0x0200UL,  9);
chkintr (0x0400UL, 10);  chkintr (0x0800UL, 11);  chkintr (0x1000UL, 12);
chkintr (0x2000UL, 13);  chkintr (0x4000UL, 14);  chkintr (0x8000UL, 15);

The only step up from there is hand-coding assembly language and there's still the good possibility that gcc may be able to outperform you :-)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top