If a linked list is too slow, the usual answer is to implement a hash table. There many, many possible variations of the data structure and algorithm. I'll just describe open, "single"-hashing, because that's the variation I'm most familiar with.
So with an open hash table, the table is just an array ("closed" hashing has an array, too, but each element is a linked list). We want the array to be, at most, about half-full for performance reasons. And at maximum-capacity, the filled table will actually have one empty slot still, because this simplifies the algorithm.
We also need a hash function which accepts the type of the key values, and returns integers. It's very difficult to predict how the hash function will behave with respect to clustered keys and overall performance. So just make sure it's an isolated function that can easily be changed later. It can be as simple as shifting-around all the bytes and adding them together.
int hash (char *key, int key_length, int table_size)
{
int ret, i;
for (i=0, ret=0; i < key_length; i++)
{
ret += key[i] << i;
}
return abs(ret) % table_size;
}
The table-lookup function uses the hash function to decide where to start looking in the array. If the key isn't found there (determined by doing a memcmp()
on the actual search key and the key stored at that position in the table), it looks at each successive key, wrapping from the end of the array back to the beginning, and declares failure if it finds an empty table element.
#define RETURN_TABLE_I_IF_EQUAL_KEY_OR_EMPTY \
if (memcmp(table + i, &key, sizeof key) == 0 || (key_type)table[i] == 0) \
return table + i;
key_value_pair *hash_lookup(key_value_pair *table, int table_size, key_type key)
{
int h, i;
h = hash(&key, sizeof key, table_size);
i = h;
RETURN_TABLE_I_IF_EQUAL_KEY_OR_EMPTY
for ( ; i < table_size; i++)
RETURN_TABLE_I_IF_EQUAL_KEY_OR_EMPTY
for (i=0; i < h; i++)
RETURN_TABLE_I_IF_EQUAL_KEY_OR_EMPTY
return NULL;
}
We'll need one more function in front of this to handle a few quirks. It can return a NULL pointer which indicates that not only has the key not been found, but the table itself is overfull. An overfull table, which really means "completely full", but we decided earlier that a "full" table should really have one empty element. This means that both for
loops should not run to completion; when it finds an empty table position, that's a failure. With an overfull table, it has to scan the entire table before discovering that the key is not present, thus losing much of the performance advtantage from using a hash at all.
The lookup function can also return a valid pointer to an empty slot. This is also a failure to find the value, but not an error. If adding the key/value pair for the first time, this will be slot to store it.
Or it could return a pointer to the desired table element. And this will be faster than a linear search, be it an array or linked list.
Deleting a key from the table requires us to fill-in the vacated position in the sequence. There are a couple of options.
If you're not worried about the table running out of space (it's set really large, and the lifetime and usage can be controlled), you can overwrite the entry with a deleted special key, distinct from an empty key.
Or, if you want to reclaim the space, too, you'll need to lookup the key, and then scan the rest of the "chain" (sequence of keys up to the next empty slot (including wrap-around)) and move the last key with a matching hash into the key-to-delete's position. Then write-over this moved key/value's position with the empty key. .... oops! This process must be repeated for the this last matching key until we're actually clearing the very last key in the chain. (I need to go fix this in my implementation right now!....)