Question

I am not familiar with Hashing algorithms and the risks associated when using them and therefore have a question on the answer below that I received on a previous question . . .

Based on the comment that the hash value must, when encoded to ASCII, fit within 16 ASCI characters, the solution is first, to choose some cryptographic hash function (the SHA-2 family includes SHA-256, SHA-384, and SHA-512) then, to truncate the output of the chosen hash function to 96 bits (12 bytes) - that is, keep the first 12 bytes of the hash function output and discard the remaining bytes then, to base-64-encode the truncated output to 16 ASCII characters (128 bits) yielding effectively a 96-bit-strong cryptographic hash.

If I substring the base-64-encoded string to 16 characters is that fundamentally different then keeping the first 12 bytes of the hash function and then base-64-encoding them? If so, could someone please explain (provide example code) for truncating the byte array?

I tested the substring of the full hash value against 36,000+ distinct values and had no collisions. The code below is my current implementation.

Thanks for any help (and clarity) you can provide.

public static byte[] CreateSha256Hash(string data)
{
    byte[] dataToHash = (new UnicodeEncoding()).GetBytes(data);
    SHA256 shaM = new SHA256Managed();
    byte[] hashedData = shaM.ComputeHash(dataToHash);

    return hashedData;
}

public override void InputBuffer_ProcessInputRow(InputBufferBuffer Row)
{
    byte[] hashedData = CreateSha256Hash(Row.HashString);

    string s = Convert.ToBase64String(hashedData, Base64FormattingOptions.None);

    Row.HashValue = s.Substring(0, 16);


}

[Original post] (http://stackoverflow.com/questions/4340471/is-there-a-hash-algorithm-that-produces-a-hash-size-of-64-bits-in-c)

Was it helpful?

Solution

No, there is no difference. However, it's easier to just get the base64 string of the first 12 bytes of the array, instead of truncating the array:

public override void InputBuffer_ProcessInputRow(InputBufferBuffer Row) {

    byte[] hashedData = CreateSha256Hash(Row.HashString);

    Row.HashValue = Convert.ToBase64String(hashedData, 0, 12);

}

The base 64 encoding simply puts 6 bits in each character, so 3 bytes (24 bits) goes into 4 characters. As long as you are splitting the data at an even 3 byte boundary, it's the same as splitting the string at the even 4 character boundary.

If you try to split the data between these boundaries, the base64 string will be padded with filler data up to the next boundary, so the result would not be the same.

OTHER TIPS

Truncating is as easy as adding Take(12) here:

Change

 byte[] hashedData = CreateSha256Hash(Row.HashString);

To:

 byte[] hashedData = CreateSha256Hash(Row.HashString).Take(12).ToArray();
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top