Pregunta

I am attempting to implement a CARP hash in Python as described in the following IETF draft:

http://tools.ietf.org/html/draft-vinod-carp-v1-03#section-3.1

Specifically:

3.1. Hash Function

The hash function outputs a 32 bit unsigned integers based on a zero-terminated ASCII input string. The machine name and domain names of the URL, the protocol, and the machine names of each member proxy should be evaluated in lower case since that portion of the URL is case insensitive.

Because irreversibility and strong cryptographic features are unnecessary for this application, a very simple and fast hash function based on the bitwise left rotate operator is used.

For (each char in URL): URL_Hash += _rotl(URL_Hash, 19) + char ;

Member proxy hashes are computed in a similar manner:

For (each char in MemberProxyName): MemberProxy_Hash += _rotl(MemberProxy_Hash, 19) + char ;

Becaues member names are often similar to each other, their hash values are further spread across hash space via the following additional operations:

MemberProxy_Hash += MemberProxy_Hash * 0x62531965 ; MemberProxy_Hash = _rotl (MemberProxy_Hash, 21) ;

3.2. Hash Combination

Hashes are combined by first exclusive or-ing (XOR) the URL hash by the machine name and then multiplying by a constant and performing a bitwise rotation.

All final and intermediate values are 32 bit unsigned integers.

Combined_Hash = (URL_hash ^ MemberProxy_Hash) ; Combined_Hash += Combined_Hash * 0x62531965 ; Combined_Hash = _rotl(Combined_Hash, 21) ;

I've tried to use numpy to create 32 bit unsigned integers. The first problem arrises when the left bit shift is implemented. Numpy automatically recasts the result as a 64 bit unsigned integer. Same for any arithmetic that would overflow 32 bits.

For example:

from numpy import uint32

def key_hash(data):
    # hash should be a 32-bit unsigned integer
    hashed = uint32()
    for char in data:
        hashed += hashed << 19 + ord(char)
    return hashed

x = key_hash("testkey")
print type(x)

Returns:

type 'numpy.int64'

Any tips of how I constrain this all to 32 bit space? Also, I am a bit confused by the spec in how performing some of these operations like "MemberProxy_Hash += MemberProxy_Hash * 0x62531965" will ever fit in 32 bits as it is calculating the hash.


EDIT:

Based upon feedback, it sounds like the right solution would be:

def key_hash(data):
    # hash should be a 32-bit unsigned integer
    hashed = 0
    for char in data:
        hashed += ((hashed << 19) + (hashed >> 13) + ord(char)) & 0xFFFFFFFF
    return hashed


def server_hash(data):
    # hash should be a 32-bit unsigned integer
    hashed = 0
    for char in data:
        hashed += ((hashed << 19) + (hashed >> 13) + ord(char)) & 0xFFFFFFFF
    hashed += (hashed * 0x62531965) & 0xFFFFFFFF
    hashed = ((hashed << 21) + (hashed >> 11)) & 0xFFFFFFFF
    return hashed


def hash_combination(key_hash, server_hash):
    # hash should be a 32-bit unsigned integer
    combined_hash = (key_hash ^ server_hash) & 0xFFFFFFFF
    combined_hash += (combined_hash * 0x62531965) & 0xFFFFFFFF
    return combined_hash

EDIT #2:

Another fixed version.

def rotate_left(x, n, maxbit=32):
    # assumes 32 bit
    x = x & (2 ** maxbit - 1)
    return ((x << n) | (x >> (maxbit - n)))


def key_hash(data):
    # hash should be a 32-bit unsigned integer
    hashed = 0
    for char in data:
        hashed = (hashed + rotate_left(hashed, 19) + ord(char))
    return hashed


def server_hash(data):
    # hash should be a 32-bit unsigned integer
    hashed = 0
    for char in data:
        hashed = (hashed + rotate_left(hashed, 19) + ord(char))
    hashed = hashed + hashed * 0x62531965
    hashed = rotate_left(hashed, 21)
    return hashed


def hash_combination(key_hash, server_hash):
    # hash should be a 32-bit unsigned integer
    combined_hash = key_hash ^ server_hash
    combined_hash = combined_hash + combined_hash * 0x62531965
    return combined_hash & 0xFFFFFFFF
¿Fue útil?

Solución

Don't bother with numpy uint32. Just use standard Python int. Constrain the result of operations as necessary by doing result &= 0xFFFFFFFF to remove unwanted high-order bits.

def key_hash(data):
    # hash should be a 32-bit unsigned integer
    hashed = 0
    for char in data:
        # hashed += ((hashed << 19) + ord(char)) & 0xFFFFFFFF
        # the above is wrong; it's not masking the final addition.
        hashed = (hashed + (hashed << 19) + ord(char)) & 0xFFFFFFFF
    return hashed

You could do just one final masking but that would be rather slow on long input as the intermediate hashed would be a rather large number.

By the way, the above would not be a very good hash function. The rot in rotl means rotate, not shift.

You need

# hashed += ((hashed << 19) + (hashed >> 13) + ord(char)) & 0xFFFFFFFF
# the above is wrong; it's not masking the final addition.
hashed = (hashed + (hashed << 19) + (hashed >> 13) + ord(char)) & 0xFFFFFFFF

Edit ... a comparison; this code:

def rotate_left(x, n, maxbit=32):
    # assumes 32 bit
    x = x & (2 ** maxbit - 1)
    return ((x << n) | (x >> (maxbit - n)))

def key_hash(data):
    # hash should be a 32-bit unsigned integer
    hashed = 0
    for char in data:
        hashed = (hashed + rotate_left(hashed, 19) + ord(char))
    return hashed

def khash(data):
    h = 0
    for c in data:
        assert 0 <= h <= 0xFFFFFFFF
        h = (h + (h << 19) + (h >> 13) + ord(c)) & 0xFFFFFFFF
    assert 0 <= h <= 0xFFFFFFFF
    return h

guff = "twas brillig and the slithy toves did whatever"
print "yours: %08X" % key_hash(guff)
print "mine : %08X" % khash(guff)

produces:

yours: A20352DB4214FD
mine : DB4214FD

Otros consejos

The following works for me, though maybe a little unpythonic:

from numpy import uint32

def key_hash(data):
    # hash should be a 32-bit unsigned integer
    hashed = uint32()
    for char in data:
        hashed += hashed << uint32(19) + uint32(ord(char))
    return hashed

x = key_hash("testkey")
print type(x)

The problem is that numbers are coerced towards more bits rather than less.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top