Question

Does size matter when choosing the right algorithm to use for a session hash.

I recently read this article and it suggested using whirlpool to create a hash for session id. Whirlpool generates a 128 character hash string, is this too large?

The plan is to store the session hash in a db. Is there much of a difference between maybe using 64 character field (sha256), 96 character field (sha384) or 128 character field (whirlpool)? One of the initial arguments made for whirlpool was the speed vs other algorithms but looking at the speed results sha384 doesn't fair too badly.

There is the option truncate the hash to make it smaller than 128 characters.

I did modify the original code snippet, to allow changing of the algorithm based of the needs.

Update: There was some discussion about string being hashed, so I've included the code.


function generateUniqueId($maxLength = null) {
    $entropy = '';

    // try ssl first
    if (function_exists('openssl_random_pseudo_bytes')) {
        $entropy = openssl_random_pseudo_bytes(64, $strong);
        // skip ssl since it wasn't using the strong algo
        if($strong !== true) {
            $entropy = '';
        }
    }

    // add some basic mt_rand/uniqid combo
    $entropy .= uniqid(mt_rand(), true);

    // try to read from the windows RNG
    if (class_exists('COM')) {
        try {
            $com = new COM('CAPICOM.Utilities.1');
            $entropy .= base64_decode($com->GetRandom(64, 0));
        } catch (Exception $ex) {
        }
    }

    // try to read from the unix RNG
    if (is_readable('/dev/urandom')) {
        $h = fopen('/dev/urandom', 'rb');
        $entropy .= fread($h, 64);
        fclose($h);
    }

    // create hash
    $hash = hash('whirlpool', $entropy);
    // truncate hash if max length imposed
    if ($maxLength) {
        return substr($hash, 0, $maxLength);
    }
    return $hash;
}
Was it helpful?

Solution

The time taken to create the hash is not important, and as long as your database is properly indexed, the storage method should not be a major factor either.

However, the hash has to be transmitted with the client's request every time, frequently as a cookie. Large cookies can add a small amount of additional time to each request. See Yahoo!'s page performance best practices for more information. Smaller cookies, thus a smaller hash, have benefits.

Overall, large hash functions are probably not justified. For their limited scope, good old md5 and sha1 are probably just fine as the source behind a session token.

OTHER TIPS

Yes, size matters.

If it's too short, you run the risk of collisions. You also make it practical for an attacker to find someone else's session by brute-force attack.

Being too long matters less, but every byte of the session ID has to be transferred from the browser to the server with every request, so if you're really optimising things, you may not want an ID that's too long.

You don't have to use all the bits of a hash algorithm, though - there's nothing stopping you from using something like Whirlpool, then only taking the first 128 bits (32 characters in hex). Practically speaking, 128 bits is a good lower bound on length, too.

As erickson points out, though, using a hash is a bit odd. Unless you have at least as much entropy as input as the length of the ID you're using, you're vulnerable to attacks that guess the input to your hash.

The article times out when I try to read it, but I can't think of a good reason to use a hash as a session identifier. Session identifiers should be unpredictable; given the title of the article, it sounds like the authors acknowledge that principle. Then, why not use a cryptographic random number generator to produce session identifiers?

A hash takes input, and if that input is predictable, so is the hash, and that's bad.

SHA1 or MD5 is probably enough for your needs. In practice, the probability of a collision is so small that it will likely never happen.

Ultimately, though, it all depends upon your required level of security. Do also keep in mind that longer hashes are both more expensive to compute and require more storage space.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top