Question

I have a program that read urls in a file and does a gethostbyname() on each URL host. This call is quite consuming. I want to cache them.

Is there a very simple map-base code snippet in C out there that I could use to do the caching? (I just don't want to reinvent the wheel).

It has to have the following points :

  • Open-source with a permissive license (think BSD or public domain).
  • Very simple : ideally less than 100 LOC
  • Keys are char* and values void*. No need to copy them.
  • No real need to implement remove(), but contains() is either needed or put() should replace the value.

PS: I tagged it homework, since it could be. I'm just being very lazy and do want to avoid all the common pitfalls I could encounter while reimplementing.

Was it helpful?

Solution

Here's a very simple and naive one

  • Fixed bucket size
  • No delete operation
  • inserts replaces the key and value, and can optionally free them

:

#include <string.h>
#include <stdlib.h>

#define NR_BUCKETS 1024

struct StrHashNode {
    char *key;
    void *value;
    struct StrHashNode *next;

};

struct StrHashTable {
    struct StrHashNode *buckets[NR_BUCKETS];
    void (*free_key)(char *);
    void (*free_value)(void*);
    unsigned int (*hash)(const char *key);
    int (*cmp)(const char *first,const char *second);
};

void *get(struct StrHashTable *table,const char *key)
{
    unsigned int bucket = table->hash(key)%NR_BUCKETS;
    struct StrHashNode *node;
    node = table->buckets[bucket];
    while(node) {
        if(table->cmp(key,node->key) == 0)
            return node->value;
        node = node->next;
    }
    return NULL;
}
int insert(struct StrHashTable *table,char *key,void *value)
{
    unsigned int bucket = table->hash(key)%NR_BUCKETS;
    struct StrHashNode **tmp;
    struct StrHashNode *node ;

    tmp = &table->buckets[bucket];
    while(*tmp) {
        if(table->cmp(key,(*tmp)->key) == 0)
            break;
        tmp = &(*tmp)->next;
    }
    if(*tmp) {
        if(table->free_key != NULL)
            table->free_key((*tmp)->key);
        if(table->free_value != NULL)
            table->free_value((*tmp)->value);
        node = *tmp;
    } else {
        node = malloc(sizeof *node);
        if(node == NULL)
            return -1;
        node->next = NULL;
        *tmp = node;
    }
    node->key = key;
    node->value = value;

    return 0;
}

unsigned int foo_strhash(const char *str)
{
    unsigned int hash = 0;
    for(; *str; str++)
        hash = 31*hash + *str;
    return hash;
}

#include <stdio.h>
int main(int argc,char *argv[])
{
    struct StrHashTable tbl = {{0},NULL,NULL,foo_strhash,strcmp};

    insert(&tbl,"Test","TestValue");
    insert(&tbl,"Test2","TestValue2");
    puts(get(&tbl,"Test"));
    insert(&tbl,"Test","TestValueReplaced");
    puts(get(&tbl,"Test"));

    return 0;
}

OTHER TIPS

Christoper Clark's hashtable implementation is very straightforward. It is more than 100 lines, but not by much.

Clark's code seems to have made its way into Google's Conccurrency Library as a parallelization example.

std::map in C++ is a red-black tree under the hood; what about using an existing red-black tree implementation in C? The one I linked is more like 700 LOC, but it's pretty well commented and looks sane from the cursory glance I took at it. You can probably find others; this one was the first hit on Google for "C red-black tree".

If you're not picky about performance you could also use an unbalanced binary tree or a min-heap or something like that. With a balanced binary tree, you're guaranteed O(log n) lookup; with an unbalanced tree the worst case for lookup is O(n) (for the pathological case where nodes are inserted in-order, so you end up with one really long branch that acts like a linked-list), but (if my rusty memory is correct) the average case is still O(log n).

You can try using following implemntation

clib

memcached?

Not a code snippet, but a high performance distributed caching engine.

Not lazy, deeply sensible to avoid writing this stuff.

How's this library never used it myself but it seems to claim to do what you ask for.

Dave Hanson's C Interfaces and Implementations includes a nice hash table, as well as many other useful modules. The hash table clocks in at 150 lines, but that's including memory management, a higher-order mapping function, and conversion to array. The software is free, and the book is worth buying.

Found an implementation here : c file and h file that's fairly close to what you asked. W3C license

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top