Frage

I have a byte I'm using for bitflags. I know that one and only one bit in the byte is set at any give time.

Ex: unsigned char b = 0x20; //(00100000) 6th most bit set

I currently use the following loop to determine which bit is set:

int getSetBitLocation(unsigned char b) {
  int i=0;
  while( !((b >> i++) & 0x01) ) { ; }
  return i;
}

How do I most efficiently determine the position of the set bit? Can I do this without iteration?

War es hilfreich?

Lösung

Can I do this without iteration?

It is indeed possible.

How do I most efficiently determine the position of the set bit?

You can try this algorithm. It splits the char in half to search for the top bit, shifting to the low half each time:

int getTopSetBit(unsigned char b) {
  int res = 0;
  if(b>15){
    b = b >> 4;
    res = res + 4;
  }
  if(b>3){
    b = b >> 2;
    res = res + 2;
  }

  //thanks @JasonD
  return res + (b>>1);
}

It uses two comparisons (three for uint16s, four for uint32s...). and it might be faster than your loop. It is definitely not shorter.


Based on the idea by Anton Kovalenko (hashed lookup) and the comment by 6502 (division is slow), I also suggest this implementation (8-bit => 3-bit hash using a de-Bruijn sequence)

int[] lookup = {7, 0, 5, 1, 6, 4, 3, 2};

int getBitPosition(unsigned char b) {
  // return lookup[(b | (b>>1) | (b>>2) | (b>>4)) & 0x7];
  return lookup[((b * 0x1D) >> 4) & 0x7];
}

or (larger LUT, but uses just three terms instead of four)

int[] lookup = {0xFF, 0, 1, 4, 2, 0xFF, 5, 0xFF, 7, 3, 0xFF, 0xFF, 6, 0xFF, 0xFF, 0xFF};

int getBitPosition(unsigned char b) {
  return lookup[(b | (b>>3) | (b>>4)) & 0xF];
}

Andere Tipps

Lookup table is simple enough, and you can reduce its size if the set of values is sparse. Let's try with 11 elements instead of 128:

unsigned char expt2mod11_bits[11]={0xFF,0,1,0xFF,2,4,0xFF,7,3,6,5};
unsigned char pos = expt2mod11_bits[b%11];
assert(pos < 8);
assert(1<<pos == b);

Of course, it's not necessarily more effective, especially for 8 bits, but the same trick can be used for larger sizes, where full lookup table would be awfully big. Let's see:

unsigned int w; 
....
unsigned char expt2mod19_bits[19]={0xFF,0,1,13,2,0xFF,14,6,3,8,0xFF,12,15,5,7,11,4,10,9};
unsigned char pos = expt2mod19_bits[w%19];
assert(pos < 16);
assert(1<<pos == w);

This is a quite common problem for chess programs that use 64 bits to represent positions (i.e. one 64-bit number to store where are all the white pawns, another for where are all the black ones and so on).

With this representation there is sometimes the need to find the index 0...63 of the first or last set bit and there are several possible approaches:

  1. Just doing a loop like you did
  2. Using a dichotomic search (i.e. if x & 0x00000000ffffffffULL is zero there's no need to check low 32 bits)
  3. Using special instruction if available on the processor (e.g. bsf and bsr on x86)
  4. Using lookup tables (of course not for the whole 64-bit value, but for 8 or 16 bits)

What is faster however really depends on your hardware and on real use cases. For 8 bits only and a modern processor I think that probably a lookup table with 256 entries is the best choice...

But are you really sure this is the bottleneck of your algorithm?

unsigned getSetBitLocation(unsigned char b) {
  unsigned pos=0;
  pos = (b & 0xf0) ? 4 : 0; b |= b >>4;
  pos += (b & 0xc) ? 2 : 0; b |= b >>2;
  pos += (b & 0x2) ? 1 : 0; 
  return pos; 
}

It would be hard to do it jumpfree. Maybe with the Bruin sequences ?

Based on log2 calculation in Find the log base 2 of an N-bit integer in O(lg(N)) operations:

int getSetBitLocation(unsigned char c) {
  // c is in {1, 2, 4, 8, 16, 32, 64, 128}, returned values are {0, 1, ..., 7}
  return (((c & 0xAA) != 0) |
          (((c & 0xCC) != 0) << 1) |
          (((c & 0xF0) != 0) << 2));
}

Easiest thing is to create a lookup table. The simplest one will be sparse (having 256 elements) but it would technically avoid iteration.

This comment here technically avoids iteration, but who are we kidding, it is still doing the same number of checks: How to write log base(2) in c/c++

Closed form would be log2(), a la, log2() + 1 But I'm not sure how efficient that is - possibly the CPU has an instruction for taking base 2 logrithms?

if you define

const char bytes[]={1,2,4,8,16,32,64,128}

and use

struct byte{
char data;
int pos;
}
void assign(struct byte b,int i){

b.data=bytes[i];
b.pos=i
}

you don't need to determine the position of the set bit

A lookup table is fast and easy when CHAR_BIT == 8, but on some systems, CHAR_BIT == 16 or 32 and a lookup table becomes insanely bulky. If you're considering a lookup table, I'd suggest wrapping it; make it a "lookup table function", instead, so that you can swap the logic when you need to optimise.

Using divide and conquer, by performing a binary search on a sorted array, involves comparisons based on log2 CHAR_BIT. That code is more complex, involving an initialisation of an array of unsigned char to use as a lookup table for a start. Once you have such the array initialised, you can use bsearch to search it, for example:

#include <stdio.h>
#include <stdlib.h>
void uchar_bit_init(unsigned char *table) {
    for (size_t x = 0; x < CHAR_BIT; x++) {
        table[x] = 1U << x;
    }
}
int uchar_compare(void const *x, void const *y) {
    char const *X = x, *Y = y;
    return (*X > *Y) - (*X < *Y);
}
size_t uchar_bit_lookup(unsigned char *table, unsigned char value) {
    unsigned char *position = bsearch(lookup, c, sizeof lookup, 1, char_compare);
    return position ? position - table + 1 : 0;
}
int main(void) {
    unsigned char lookup[CHAR_BIT];
    uchar_bit_init(lookup);
    for (;;) {
        int c = getchar();
        if (c == EOF) { break; }
        printf("Bit for %c found at %zu\n", c, uchar_bit_lookup(lookup, c));
    }
}

P.S. This sounds like micro-optimisation. Get your solution done (abstracting the operations required into these functions), then worry about optimisations based on your profiling. Make sure your profiling targets the system that your solution will run on if you're going to focus on micro-optimisations, because the efficiency of micro-optimisations differ widely as hardware differs even slightly... It's usually a better idea to buy a faster PC ;)

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top