Yes, opening a file in text mode potentially can result in different data being read as newlines are translated for you from the platform native format to \n
. Thus, files containing \r\n
will give you a different checksum when read on Windows vs. a POSIX platform.
Open files in binary mode instead:
with open(file_loc, 'rb') as file_to_read:
Note that the same applies when writing a file. If you receive data from a POSIX system using \n
line endings, and you write this to a file opened for writing in text mode on Windows, you'll end up with \r\n
line endings in the written file.
If you are using Python 3, you are complicating matters some more. When you are opening files in text mode, you are translating the data from encoded bytes to decoded Unicode values. What codec is used for that can also differ from OS to OS, and even from machine to machine. The default is locale-defined (using locale.getpreferredencoding(False)
), and as long as the data is decodable by the default locale, you can get very different results from reading a file using a different codec. You really want to ensure you use the same codec by setting it explicitly, or better still, open files in binary mode.
Since hashlib
requires you to feed it byte strings, this is less of a problem when trying to calculate the digest (you'd have run into that problem and at least have to think about codecs there), but this applies to file transfers too; writing to text file will encode the data to the default codec.