How to compute modulo of a hash?
-
08-12-2020 - |
質問
Let's say that I have a set of users in my database, that have GUIDs as their IDs. I use xxhash
to generate fixed-length hashes for each value, so that I can then proceed to "bucketizing" them and being able to do random sampling with the help of the modulo function.
That said, if I have a hash such as 367b50760441849e
, I want to be able to use hash % 20 == 0
to randomly pick 5% of the population (hence, 20 "buckets"). This is the approach that is used in Kusto hash()
with a modulo argument.
With this in mind, what is the approach that should be used to calculate an integer value from the hash, so that I can calculate the modulo?
解決
Any good hash will be uniformly distributed, which means that you can assume a uniform distribution when you apply modulo n
, as long as $n < 2^{M/2}$, where M is the number of bits in your hash, see here. So for SHA1-32 you would at most modulo by $2^{16}$.
There is no approach to calculating an integer value; what you have there is an hexadecimal representation of a hash, you just need to convert it to a numeric type if you obtained it as a string. XXH32() and XXH64()
both already produce an unsigned int output.