Question

Btrfs supports crc32c, xxhash, sha256 and blake2b as checksums when storing and reading files. crc32c and xxhash are designed to detect random errors while sha256 and blake2 are considered cryptographically secure. My question is, in what scenarios do I need cryptographically secure checksums in btrfs (or other filesystems)?

Is it only in scenarios where someone intentionally can change the data (eg. hackers) or are xxhash not good enough to detect random errors in some cases?

What is the probability that some data changes on its own (random error) and at the same time has the same checksum as before the error occurred (collision)?

Also, not as important, do disk usage increase when using blake2b instead of eg. xxhash or do inode sizes remain the same? Should I even care about this?

Anyway, thanks!

Was it helpful?

Solution

A question is which width of xxhash is used?

Due to the birthday paradox the chance of an accidental collision is higher than you might think. For example there's a 50% chance of an accidental collision with just 65000 different items to hash with a 32-bit CRC (based on the square-root approximation of the birthday paradox). And 50% is much higher than you'd like to risk.

Both sha256 and blake2b are much wider than this. (Or, that is, Blake2b can be.) So the chance of an accidental collision is much lower.

(In this context an "accidental collision" would be a data corruption that hashed to the same value as the original. In a file system you'd have of course many many many more than 65000 items to hash ... whether you do it by disk blocks or files ...)

So it isn't necessary to have a cryptographical secure hash - just a wide good one. But for practical purposes, the fastest generally available wide hashes are the cryptographic ones because that's where the time and effort has been spent to optimize them.

(I have no idea about how this impacts the space required by inodes, if it does at all. But in the grand scheme of things, unless you're on a really restricted-space file system in an embedded system of some kind, the different is likely negligible. But someone else could answer that.)

Licensed under: CC-BY-SA with attribution
scroll top