Pergunta

I am doing some code review for a project and they have generated a name based UUID using SHA256 as the hashing algorithm.

I found a some Java code that created a Name-based (hashing) UUIDs using SHA-256 in Java

The Java code takes the first 16 bytes from the SHA256 and sets the UUID type to 5 and sets the RFC 4122 variant. Type 5 is the SHA1 named type not SHA256.

I found another implementation (sorry lost the link) that did the same thing but set the UUID type to 6. Type 6 is not a type found in RFC 4122.

So my question is there a standard expansion to RFC 4122 for generating a name based UUID using SHA256?

Foi útil?

Solução

Creating UUIDs of version 3 or 5 is not a cryptographic act, thus the cryptographic strength of the hashing method is pretty much irrelevant.

The fact, that SHA1 has been (partially!) broken in the past only means it is not suitable anymore for certain cryptographic tasks. It won't make SHA-1 unsuitable as a hash algorithm in general, as SHA1 still produces very good, well balanced hashes values. After all it's no requirement that it must be hard to impossible to guess the "name" that was used to calculate the UUID, it's only a requirement that two different names are very unlikely to result in equal UUIDs and for that SHA-1 is still more than adequate. Thus there has never been a requirement for a migration strategy from SHA-1 to SHA-256.

The site you are referencing writes:

SHA-256 for example has a better resistance against preimage and collision attacks than SHA-1 and should retain these properties even after truncation. For that reason I developed the following class using SHA-256 to create the message digest, instead of MD5 or SHA-1.

But that's irrelevant. UUID generation must not be safe against attacks. The author of that code didn't understand the purpose of SHA-1 in the UUID generation.

RFC 4122 states:

  1. Security Considerations

    Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation.

So of what point would a "preimage and collision attacks" be in that case?
What do you want to attack with it?

And using SHA-256 for the UUID format that is specified to use SHA-1 is a very bad idea! There is a reason why UUIDs based on MD5 have a different version than those based on SHA-1 (instead of just switching the hash algorithm and keeping the version): Two different data sets used for UUID generation will hardly ever have the same SHA-1 (chances of a collision here are really, really tiny), but it could be that the SHA-1 hash of data set A equals the truncated SHA-256 hash of data set B (this is much more likely than the first case). Same would hold true for MD5 and SHA-1, that's why both are banned to their own sample space by using different versions (it's like name spaces, that avoid collisions, even if the same name exists in both spaces).

As for why you shouldn't use SHA-1 for cryptographic tasks anymore is that it is possible to create intentional collisions (two data sets which results in the same SHA-1 hash). This is a problem in situations where a SHA-1 hash is used to verify the correctness of security relevant data - think of certificate fingerprints. If you can create a fake certificate that has the same fingerprint as a real one, you can break TLS connections based on that real certificate with a simple man-in-the-middle attack, without either side of the TLS connection noticing it. You just replace the real cert the server sends with your fake one, yet as your fake one has the same fingerprint, it will validate correctly at the client side. Now you can read and manipulate all data send over that TLS connection.

In situations where security is irrelevant, the fact that you can intentionally create collisions is irrelevant as well. Natural collisions are always possible, since the sample space of a hash is limited (2^128 in case of MD5, 2^160 in case of SHA-1), the data space is unlimited, thus of course multiple data sets must exists that result in the same hash value. This can only be avoided if the data space is smaller than the sample space (your hash is bigger than all data that you ever plan to hash with it). For security the question is only "Can you intentionally create such a collision when desired or not?", as that's usually not the case. If you just want to store hash values of files to verify correct transmission or see if a file was altered, and security is no concern here (you don't expect anyone to manipulate files to intentionally match a specific hash value), even MD5 is still sufficient for that task, despite the fact that it is a no-go for cryptographic tasks today.

Licenciado em: CC-BY-SA com atribuição
scroll top