Since SHA-1
's output is uniformly distributed, you can approximate the collision rate using the Birthday Paradox:
Assume you keep n
bits of the SHA-1
output, there is a ~50% chance that you would have a collision in a set containing 2^(n/2)
records, or in other words your collision rate is approximately 1/2^(n/2)
If you need a more accurate answer, you can always use the formula in the answer you've referenced in your question.
So here, if we assume each character is 1 Byte (8 bits), then you will most likely encounter a collision if you have ~2^(8*8/2) = 4294967296
records (therefore the collision rate is going to be 2.32 * 10^-8
which is very small).
Considering the collision rate you have discovered using your test program, the ToSHA1Fingerprint()
function returns a Hexadecimal string which means an 8 character sub-string of it only represents 4 bytes and hence the approximate collision rate based on the above formula is 1/2^(4*8/2) = 0.000015258789
or 0.002%
.