I have the following C code from here:
http://marknelson.us/1989/10/01/lzw-data-compression/
It states that it uses a XOR hashing algorithm to avoid having to search through substring array elements and instead generate an "address" for a substring.
Then there is a hashing function, for which the line below seems to me as being the main part:
index = (hash_character << HASHING_SHIFT) ^ hash_prefix;
I am pretty sure that the shifting part is just to get rid of unused bits, and then a single, simple XOR operation is being applied to a very short substring, from 12 to 16 bits; although I might be very wrong on this.
What is the name, or possible list of algorithm names, that explain this specific XOR hashing algorithm, or if possible, a list of algorithm names that are good for this substring application (like in the LZW dictionary of substrings, etc.)?
#define BITS 12 /* Setting the number of bits to 12, 13*/
#define HASHING_SHIFT (BITS-8) /* or 14 affects several constants. */
#define MAX_VALUE (1 << BITS) - 1 /* Note that MS-DOS machines need to */
#define MAX_CODE MAX_VALUE - 1 /* compile their code in large model if*/
/* 14 bits are selected. */
#if BITS == 14
#define TABLE_SIZE 18041 /* The string table size needs to be a */
#endif /* prime number that is somewhat larger*/
#if BITS == 13 /* than 2**BITS. */
#define TABLE_SIZE 9029
#endif
#if BITS <= 12
#define TABLE_SIZE 5021
#endif
......
......
......
/*
** This is the hashing routine. It tries to find a match for the prefix+char
** string in the string table. If it finds it, the index is returned. If
** the string is not found, the first available index in the string table is
** returned instead.
*/
int find_match(int hash_prefix,unsigned int hash_character)
{
int index;
int offset;
index = (hash_character << HASHING_SHIFT) ^ hash_prefix;
if (index == 0)
offset = 1;
else
offset = TABLE_SIZE - index;
while (1)
{
if (code_value[index] == -1)
return(index);
if (prefix_code[index] == hash_prefix &&
append_character[index] == hash_character)
return(index);
index -= offset;
if (index < 0)
index += TABLE_SIZE;
}
}