Is CRC32 really so bad for file integrity check?
Question
Of course that MD5 is better then CRC32, SHA1 is better then MD5 and so on... But also they are also much slower then CRC32.
Right know, I am thinking about how to check consistency of being transfered file and CRC32 is fastest option.
I haven't found anywhere how bad is CRC32 for integrity checks (maybe in other words how is probably that CRC32 will not detect malformed file)?
Solution
Quoting from http://www.mathpages.com/home/kmath458.htm :
So, if we assume that any corruption of our data affects our string in a completely random way, i.e., such that the corrupted string is totally uncorrelated with the original string, then the probability of a corrupted string going undetected is 1/(2^n). This is the basis on which people say a 16-bit CRC has a probability of 1/(2^16) = 1.5E-5 of failing to detect an error in the data, and a 32-bit CRC has a probability of 1/(2^32), which is about 2.3E-10 (less than one in a billion).
My opinion: CRC-32 is more than enough for error detection. It is being used widely. However, it is not secure when you want to use it as a "hash function".
OTHER TIPS
Collisions (same hash output but different data) can occur easily using CRC-32 because CRC-32 use only 32bits compare to other algorithms ex. MD5 is 128-bits, SHA-1 is 160-bits, SHA-2 (SHA256/512 series) is 224bits-512bits. (depend on what you use). Also, for SHA-2 series no collision has been found.
For more info about mathematics and probability that would cause your data a collision. Please look for Hash Collision and Birthday paradox problem