how to implement hash function `h(k) = (A·k mod 2^w) >> (w – r)` in Java

StackOverflow https://stackoverflow.com/questions/10416550

  •  05-06-2021
  •  | 
  •  

Вопрос

IMPORTANT NOTICE:

This is not a discussion thread for people to give me their opinion about hashing. I just need to know how to make the given function work in java -- an example would be best.

PROBLEM:

Trying to hone my understanding of hash functions for a pending interview, I watch two free lectures by MIT computer science professors (http://videolectures.net/mit6046jf05_leiserson_lec08/). So after the lecture, I am trying to implement the following hash function in java.

h(k) = (A·k mod 2^w) >> (w – r)
WHERE
r: m, the size of the array, is a power of 2 such that m=2^r
w: the computer has w-bit words, such as 32-bit or 64-bit computer
k: the value I am to find a key for
A: a random odd number (prime would be great) between 2^(w-1) and 2^w    

I thought this would be easy to implement in java. But when I do 2^w where w=32, I get inaccurate results in Java. In real life 2^32 = 4294967296 but not in java, which truncates the result to 2^31 - 1 or 2147483647.

Does anyone know how to fix this problem so to implement the function in Java?

EDIT:

I see a lot of the replies focus on 32. What if my computer is 64 bit? I am stuck with setting w = 32 because I am using Java?

Это было полезно?

Решение

Some of the terms are redundant because Java assumes this behaviour anyway.

A·k mod 2^w

In Java, integer multiplication overflows and thus does a mod 2^w (with a sign). The fact that it has a sign doesn't matter if you are then shifting by at least one bit.

Shift of (w - r) is the same as a shift of -r in Java (the w is implied by the type)

private static final int K_PRIME = (int) 2999999929L;

public static int hash(int a, int r) {
   // return (a * K_PRIME % (2^32)) >>> (32 - r);
   return (a * K_PRIME) >>> -r;
}

for 64-bit

private static final long K_PRIME = new BigInteger("9876534021204356789").longValue();

public static long hash(long a, int r) {
    // return (a * K_PRIME % (2^64)) >>> (64 - r);
    return (a * K_PRIME) >>> -r;
}

I have written this example to show you can do the same thing in BigInteger and why you wouldn't. ;)

public static final BigInteger BI_K_PRIME = new BigInteger("9876534021204356789");
private static long K_PRIME = BI_K_PRIME.longValue();

public static long hash(long a, int r) {
    // return (a * K_PRIME % (2^64)) >>> (64 - r);
    return (a * K_PRIME) >>> -r;
}

public static long biHash(long a, int r) {
    return BigInteger.valueOf(a).multiply(BI_K_PRIME).mod(BigInteger.valueOf(2).pow(64)).shiftRight(64 - r).longValue();
}

public static void main(String... args) {
    Random rand = new Random();
    for (int i = 0; i < 10000; i++) {
        long a = rand.nextLong();
        for (int r = 1; r < 64; r++) {
            long h1 = hash(a, r);
            long h2 = biHash(a, r);
            if (h1 != h2)
                throw new AssertionError("Expected " + h2 + " but got " + h1);
        }
    }

    int runs = 1000000;
    long start1 = System.nanoTime();
    for (int i = 0; i < runs; i++)
        hash(i, i & 63);
    long time1 = System.nanoTime() - start1;

    long start2 = System.nanoTime();
    for (int i = 0; i < runs; i++)
        biHash(i, i & 63);
    long time2 = System.nanoTime() - start2;
    System.out.printf("hash with long took an average of %,d ns, " +
            "hash with BigInteger took an average of %,d ns%n",
            time1 / runs, time2 / runs);
}

prints

hash with long took an average of 3 ns, \
    hash with BigInteger took an average of 905 ns

Другие советы

Neither int nor long would be sufficiently large enough to hold all of the values you'd need in 2^(w-1). You would be best served with BigInteger.

Let's look what number % 2^32 actually does: It gets the remainder of the division by 2^32. If you have a range from 0 to 2^32, the computer will automatically do the modulo for you, because it throws away everything above 2^32.

Let's take 8 instead of 32, and switch to binary number system:

  1000 1000 % 1 0000 0000 = 1000 1000
1 1000 1000 % 1 0000 0000 = 1000 1000

So what you should do is to limit the number to the range of the computer. If you would use e.g. c++, it would be as simple as declaring the value as unsigned int. The first 1 of the second example above would simply be truncated because it does not fit into the variable.

In java, you don't have unsigned integers. If you calculate A * k, and that results in an overflow, you may get a signed value. But as the only thing you have to do next is to do a right shift, this should not matter.

So my suggestion is to simply drop the modulo calculation. Try it, I'm not quite sure whether it works.

The Java primative int has a range of minimum value of -2,147,483,648 and a maximum value of 2,147,483,647

Check out this link for details on the primatives.

I recommend using a long instead of an int.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top