how to implement hash function `h(k) = (A·k mod 2^w) >> (w – r)` in Java
-
05-06-2021 - |
Вопрос
IMPORTANT NOTICE:
This is not a discussion thread for people to give me their opinion about hashing. I just need to know how to make the given function work in java -- an example would be best.
PROBLEM:
Trying to hone my understanding of hash functions for a pending interview, I watch two free lectures by MIT computer science professors (http://videolectures.net/mit6046jf05_leiserson_lec08/). So after the lecture, I am trying to implement the following hash function in java.
h(k) = (A·k mod 2^w) >> (w – r)
WHERE
r: m, the size of the array, is a power of 2 such that m=2^r
w: the computer has w-bit words, such as 32-bit or 64-bit computer
k: the value I am to find a key for
A: a random odd number (prime would be great) between 2^(w-1) and 2^w
I thought this would be easy to implement in java. But when I do 2^w where w=32, I get inaccurate results in Java. In real life 2^32 = 4294967296
but not in java, which truncates the result to 2^31 - 1
or 2147483647
.
Does anyone know how to fix this problem so to implement the function in Java?
EDIT:
I see a lot of the replies focus on 32. What if my computer is 64 bit? I am stuck with setting w = 32
because I am using Java?
Решение
Some of the terms are redundant because Java assumes this behaviour anyway.
A·k mod 2^w
In Java, integer multiplication overflows and thus does a mod 2^w
(with a sign). The fact that it has a sign doesn't matter if you are then shifting by at least one bit.
Shift of (w - r)
is the same as a shift of -r
in Java (the w is implied by the type)
private static final int K_PRIME = (int) 2999999929L;
public static int hash(int a, int r) {
// return (a * K_PRIME % (2^32)) >>> (32 - r);
return (a * K_PRIME) >>> -r;
}
for 64-bit
private static final long K_PRIME = new BigInteger("9876534021204356789").longValue();
public static long hash(long a, int r) {
// return (a * K_PRIME % (2^64)) >>> (64 - r);
return (a * K_PRIME) >>> -r;
}
I have written this example to show you can do the same thing in BigInteger and why you wouldn't. ;)
public static final BigInteger BI_K_PRIME = new BigInteger("9876534021204356789");
private static long K_PRIME = BI_K_PRIME.longValue();
public static long hash(long a, int r) {
// return (a * K_PRIME % (2^64)) >>> (64 - r);
return (a * K_PRIME) >>> -r;
}
public static long biHash(long a, int r) {
return BigInteger.valueOf(a).multiply(BI_K_PRIME).mod(BigInteger.valueOf(2).pow(64)).shiftRight(64 - r).longValue();
}
public static void main(String... args) {
Random rand = new Random();
for (int i = 0; i < 10000; i++) {
long a = rand.nextLong();
for (int r = 1; r < 64; r++) {
long h1 = hash(a, r);
long h2 = biHash(a, r);
if (h1 != h2)
throw new AssertionError("Expected " + h2 + " but got " + h1);
}
}
int runs = 1000000;
long start1 = System.nanoTime();
for (int i = 0; i < runs; i++)
hash(i, i & 63);
long time1 = System.nanoTime() - start1;
long start2 = System.nanoTime();
for (int i = 0; i < runs; i++)
biHash(i, i & 63);
long time2 = System.nanoTime() - start2;
System.out.printf("hash with long took an average of %,d ns, " +
"hash with BigInteger took an average of %,d ns%n",
time1 / runs, time2 / runs);
}
prints
hash with long took an average of 3 ns, \
hash with BigInteger took an average of 905 ns
Другие советы
Neither int
nor long
would be sufficiently large enough to hold all of the values you'd need in 2^(w-1). You would be best served with BigInteger
.
Let's look what number % 2^32
actually does: It gets the remainder of the division by 2^32. If you have a range from 0 to 2^32, the computer will automatically do the modulo for you, because it throws away everything above 2^32.
Let's take 8 instead of 32, and switch to binary number system:
1000 1000 % 1 0000 0000 = 1000 1000
1 1000 1000 % 1 0000 0000 = 1000 1000
So what you should do is to limit the number to the range of the computer. If you would use e.g. c++, it would be as simple as declaring the value as unsigned int
. The first 1
of the second example above would simply be truncated because it does not fit into the variable.
In java, you don't have unsigned integers. If you calculate A * k
, and that results in an overflow, you may get a signed value. But as the only thing you have to do next is to do a right shift, this should not matter.
So my suggestion is to simply drop the modulo calculation. Try it, I'm not quite sure whether it works.
The Java primative int
has a range of minimum value of -2,147,483,648 and a maximum value of 2,147,483,647
Check out this link for details on the primatives.
I recommend using a long
instead of an int
.