Checksum for multipage tiff document
Question
I want to calculate the checksum for a large tiff file that might not fit in memory. Will I get a reliable value if I instead calculate the checksum of every page and then calculate the checksum of the array of page checksums or will I run into a mathematical problem that I am not seeing and the only correct way to do it is to in fact work with the whole thing?
Thanks!
Solution
I don't know if understood the question correctly, but with most checksum algorithms you only have to load a small part of the message to memory. Because of that operating on the streams instead of memory locations is possible and has been done before.
Edit:
I only know that you have to be careful with Adler-32 when checksumming short messages, you would not be covering the whole hash space and false positives are more likely (yest, the array of checksums would probably be a short message).
With crypto hashes I honestly don't know. My intuition is that md5(msg1 + msg2 + ...) is as reliable as md5(md5(msg1) + md5(msg2) + ...) but we'll have to wait for someone smarter than me to give definitive answer :)