Pregunta

Haskell and Python don't seem to agree on Murmurhash2 results. Python, Java, and PHP returned the same results but Haskell don't. Am I doing something wrong regarding Murmurhash2 on Haskell?

Here is my code for Haskell Murmurhash2:

import Data.Digest.Murmur32

    main = do
    print $ asWord32 $ hash32WithSeed 1 "woohoo"

And here is the code written in Python:

import murmur

if __name__ == "__main__":
    print murmur.string_hash("woohoo", 1)

Python returned 3650852671 while Haskell returned 3966683799

¿Fue útil?

Solución 2

The murmur-hash package (I am its author) does not promise to compute the same hashes as other languages. If you rely on hashes to be compatible with other software that computes hashes I suggest you create newtype wrappers that compute hashes the way you want them. For text, in particular, you need to at least specify the encoding. In your case you could convert the text to an ASCII string using Data.ByteString.Char8.pack, but that still doesn't give you the same hash since the ByteString instance is more of a placeholder.

BTW, I'm not actively improving that package because MurmurHash2 has been superseded by MurmurHash3, but I keep accepting patches.

Otros consejos

From a quick inspection of the sources, it looks like the algorithm operates on 32 bits at a time. The Python version gets these by simply grabbing 4 bytes at a time from the input string, while the Haskell version converts each character to a single 32-bit Unicode index.

It's therefore not surprising that they yield different results.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top