Question

I am looking for a hashing algorithm that produces a 31/32 bit signed/unsigned integer as a digest for a utf8 string with the purpose of using the output for seeding a prng, such as a Park-Miller-Carta LCG or a Mersenne-Twister.

I have looked into FNV1 and FNV1a, but they provide very close values for similar strings differing in their last character; I would like to have a low collision hash that radically changes upon minimal modifications on the input string. Performance is not an issue.

My current approach consists in a dirty LCG that uses character codes and a prime number as multipliers:

a = 524287;
for ( i = 0; i < n; i ++ )
a = ( a * string.charCodeAt ( i ) * 16807 + 524287 ) % 2147483647;

Please let me know of any better alternatives.

Was it helpful?

Solution

If you're generating 32-bit value, consider using classic CRC32. FNV is suposed to be fast alternative to CRC, and you're saying, that performance is not an issue.

OTHER TIPS

Use SHA-2

It is the best/latest hashing algorithm out there. It is always advisable to go with standard algorithms.

Any cryptographically strong hash will have the properties you want, but generate more bits, but simple truncation of the result to 32 bits would be fine. I presume cryptographic strength is not an actual requirement so that flawed (but widely used) hash schemes like MD5 would be adequate - and readily available in many libraries.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top