Are standard hash functions like MD5 or SHA1 quaranteed to be unique for small input (4 bytes)?

StackOverflow https://stackoverflow.com/questions/17877830

  •  04-06-2022
  •  | 
  •  

Domanda

Scenario:

I'm writing web service, that will act like identity provider for 3pty application. I have to send to this 3pty application some unique identifier of our user. In our database, unique user identifier is integer (4 bytes, 32 bites). Per our security rules I can't send those in plain form - so sending them out hashed (trough function like MD5 or SHA1) was my first idea.

Problem:

The result of MD5 is 16 bytes, result of SHA1 is 40 bytes, I know they can't be unique for larger input sets, but given the fact my input set is only 4 bytes long (smaller then hashed results) - are they guaranteed to be unique, or am I doomed to some poor-man hash function (like xoring the integer input with some number, shifting bites, adding predefined bites, etc.) ?

È stato utile?

Soluzione

For what you're trying to achieve (preventing a 3rd party from determining your user identifier), a straight MD5 or SHA1 hash is insufficient. 32 bits = about 4 billion values, it would take less than 2 hours for the 3rd party to brute force every value (@1m hashes/sec). I'd really suggest using HMAC-SHA1 instead.

As for collisions, this question has an extremely good answer on their likelihood. tl;dr For 32-bits of input, a collision is excessively small.

If your user identifiers aren't random (they increment by 1 or there is a known algorithm for creating them), then there's no reason you can't generate every hash to make sure that no collision will occur.

This will check the first 10,000,000 integers for a collision with HMAC-SHA1 (will take about 2 minutes to run):

public static bool checkCollisionHmacSha1(byte[] key){
    HMACSHA1 mac = new HMACSHA1(key);
    HashSet<byte[]> values = new HashSet<byte[]>();
    bool collision = false;
    for(int i = 0; i < 10000000 && collision == false; i++){
        byte[] value = BitConverter.GetBytes(i);
        collision = !values.Add(mac.ComputeHash(value));
        if (collision)
            break;
    }
    return collision;
}

Altri suggerimenti

First, SHA1 is 20 bytes not 40 bytes.

Second, although input is very small, there still may be a collision. It is best to test this, but I do not know a feasible way to do that.

In order to prevent any potential collision:

1 - Hash your input and produce the 16/20 bytes of hash
2 - Spray your actual integer onto this hash. 
    Like put a byte of your int every 4/5 bytes.

    This will guarantee the uniqueness by using the input itself.

Also, take a look at Collision Column part

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top