Question

Starting from the CPython 3.3, hash randomization is enabled by default. On previous versions it could be turned on by specifying the -R command-line option or by setting the PYTHONHASHSEED environment variable to random.

Citing the documentation:

By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

Does this mean that the generated value will be cryptographically strong?

Was it helpful?

Solution

In Python 3.3, the hash seed is not cryptographically strong; it is generated at startup with the following pseudo-random generator:

/* Fill buffer with pseudo-random bytes generated by a linear congruent
   generator (LCG):

       x(n+1) = (x(n) * 214013 + 2531011) % 2^32

   Use bits 23..16 of x(n) to generate a byte. */
static void
lcg_urandom(unsigned int x0, unsigned char *buffer, size_t size)
{
    size_t index;
    unsigned int x;

    x = x0;
    for (index=0; index < size; index++) {
        x *= 214013;
        x += 2531011;
        /* modulo 2 ^ (8 * sizeof(int)) */
        buffer[index] = (x >> 16) & 0xff;
    }
}

which is not cryptographically strong.

There are also other problems with the hash seeding that still made it possible to force collisions.

Python 3.4 addressed these issues by introducing a more secure hashing algorithm by default, and made it pluggable.

If you need cryptographically strong random numbers in your program use random.SystemRandom() or os.urandom() instead.

OTHER TIPS

Prior to 3.4, Python used a variant of FNV, which is not cryptographically secure. Unfortunately, simply adding a random value to a weak hash function like Python tried doesn't provide any real security. It is easy to generate strings that will have the same FNV hash, even in the presence of randomization, thanks to the weakness of the underlying hash algorithm.

Note that this is true even if the seed is perfectly random and not leaked to the client.

To consider why, imagine a very weak hash function - simply adding up all the characters in the string. In this case, if you add a random value to the beginning, then the hash of any single string will be random. However, if two strings have characters which sum to the same value, then they will hash to the same value, regardless of what the random seed is. Therefore, the random seed doesn't provide any collision resistance. What Python actually did isn't quite this bad, but it isn't much better, either.

In 3.4, Python switched the default algorithm to SipHash, which is beleived to by cryptographically secure against collision DOS attacks. Unfortunately, anyone using 2.x is completely out of luck.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top