Question

I am designing a file-server application where I want to check if a cached file on a client computer is the last version who is kept on the server.

I don't quite trust the 'changed date' attribute in the file system, so I want to compare the actual bytes in the file.

I think the fastest way to do this (as sending all the bytes across the web takes some time), is to send the file length and hash bytes to the server. Then the server checks the file length first, and if they match, it computes a hash for the file located on the server, and then checks if it is the same that the client computed.

Can anybody tell me what the how probable the hash collisions are when the file size is the same? (I am currently using MD5 for its speed).

Can I assume if the file size is the same and the hash is the same that the content is the same?

Thanks!

Was it helpful?

Solution

Random collisions in MD5 are so improbable that its almost certainly safe to ignore the possibility.

However MD5 has been shown to be cryptographically weak so a malicious adversary could deliberately create files that collide. A famous example is:

On 30 December 2008, a group of researchers announced at the 25th Chaos Communication Congress how they had used MD5 collisions to create an intermediate certificate authority certificate which appeared to be legitimate when checked via its MD5 hash.

Source

OTHER TIPS

I think this stackoverflow question is similar to yours and offers some helpful insight. They conclude that checking file date is akin to suspenders for a belt.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top