Question

I am writing a key record look up where the I have an index between the key and the rec number. This is sorted on the key. Is there away to do this better that what I have for speed optimization?

typedef struct
{
    char key[MAX_KEYLEN];
    int  rec;
} KeyRecPair;

typedef struct
{
    KeyRecPair *map;
    int         numRecs;
} KeyRecMap;

int GetRecFromKey(char *key, KeyRecMap *theMap)
{
    int cmpValue, bottom = 0;
    int half = theMap->numRecs / 2;
    int top = theMap->numRecs - 1;

    while (bottom != top)
    {
        cmpValue = strncmp(key, theMap->map[half].key, MAX_KEY_LEN); 

        if (cmpValue > 0)
        {
            /*top stays*/
            bottom = half + 1;
            half  = bottom + (top - bottom) / 2;
            continue;
        }
        if (cmpValue < 0)
        {
            /*bottom stays*/
            top = half - 1;
            half  = bottom + (top - bottom) / 2;
            continue;
        }
        return theMap->map[half].rec;
    }

    if (0 == strncmp(key, theMap->map[half].key, MAX_KEY_LEN))
        return theMap->map[half].rec;
    return 0;
}
Was it helpful?

Solution

A good chunk of your time will be spent in the strncmp.

I suggest forcing that to be inlined, or rewriting it inline, to avoid the function call over head.

If you are feeling brave it may be possible to unroll the loop once or twice and see a performance gain.

If your string was actually a fixed length of array of char, you could make the length a multiple of 4 and and compare 4 bytes at a time with an unsigned int compare, instead of 1 byte at a time.

If you don't have a profiler, you should get one. Profilers make it easy to see what the relative costs of various implementations are.

Another option would be to pick a different way to organize your data. Check out AVL trees for inspiration. Choosing some sort of Hashing function, like the others mentioned, may be a viable option

OTHER TIPS

The bsearch library function performs a binary search over an array, given a suitable compare function you implement. Being a library function, it is well-optimised and (hopefully) bug-free.

Instead of using a binary search to locate the item, a hash map might be more suitable because it has O(1) lookup characteristics. However that might be slow with load of collisions with a naive approach. However this paper describes a way to create a hashmap like tree that has O(log(n) / log(32)) access time which generally outperforms normal hashmap implementations. (The fixed aray + linked list implementation).

Any chance you could use a key that isn't a string? or at least shortest possible strings? (what is MAX_KEYLEN's value) that strcmp every iteration of the loop likely is one of the slower parts of the search.

Is there a reason for wanting to optimize this? Have you run the program with a profiler and determined that the lookup takes a significant part of the total runtime? Are you just curious about how fast you can get it? (Either are, in my opinion, good reasons.) If you are just randomly optimizing for the heck of it, don't.

Also, remember to benchmark. It's hard to tell which of two versions of a function are faster on a modern system (it was easier on my Z80). How many cache misses may or may not be more important than the number of branches wrongly predicted.

The only potential optimization that I can think of is to use something similar to golden ratio in calculating half instead of dividing the remaining subset to two halves with equal number of elements, that is

        if (cmpValue > 0)
        {
                /*top stays*/
                bottom = half + 1;
                half = bottom + (top - bottom) * 3 / 5;
                continue;
        }
        if (cmpValue < 0)
        {
                /*bottom stays*/
                top = half - 1;
                half = bottom + (top - bottom) * 2 / 5;
                continue;
        }

Instead of dividing by 2 U can make advantage of bit shift operator.

=> for /2 u can use >> 1

Since you're going to have to compute half once per loop, why not just do that once, just before it is used? That would cut two complex-looking (at least, relatively speaking) lines of code.

Although I would expect a decent optimizer to do this for you, I'd put theMap->map into a local so that it has half a chance of ending up in a register instead of dereferencing it on each access. Again, I'd expect the optimizer to do this for you, so you might also want to check the assembly output.

I looked at visual studio 2008's output in release and it does a pretty good job on the code. For example, the comparison code looks like this:

; 30   :         if (cmpValue > 0)
test    eax, eax
jle SHORT $LN11@GetRecFrom
; 31   :         {
; omitted inner block for > case.
$LN11@GetRecFrom:
; 37   :         if (cmpValue < 0)
jge SHORT $LN2@GetRecFrom

basically, it's a branch to branch without retesting cmpValue. Nice touch.

There is a slight benefit to putting theMap->map into a local, but it's tiny. If MAX_KEY_LEN is not a nice multiple of 4 and structs aren't padded, you should definitely put the int first in your struct.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top