Question

So I have a python script that uses the pyserial library to send a file over serial to another computer. I wrote some script to calculate the md5 checksum of the file before and after being sent over serial and I have encountered some problems.

Example: I sent a simple file named third.txt containing a list of numbers 1 through 10. Simple file, nothing fancy or large. The checksum of the file before transmitting is completely different than the checksum of the file after transmitting on the other computer, even though the files are clearly the same.

I checked to see if there was something wrong with my code by simply moving the file over a USB and doing the checksum calulations this way. This time it worked.

Any ideas why this is happening and how I might possibly fix it?

Here is my checksum code before sending. This is not the exact code, but basically what I did.

<<Code that waits for command from client>>

with open(file_loc) as file_to_read:
    data = file_to_read.read()
    md5a = hashlib.md5(data).hexdigest()
ser.write('\n' + md5a + '\n') 

Here is my checksum code after sending.

with open(file_loc) as file_to_read:
    data = file_to_read.read()
    md5b = hashlib.md5(data).hexdigest()
print('Sending Checksum Command')
ser.write("\n<<SENDCHECKSUM>>\n")

md5a = ser.readline()
print(md5a)
print(md5b)
if md5a == md5b:
    print("Correct File Transmission")
else:
    print("The checksum indicated incorrect file transmission, please check.")
ser.flush()
Was it helpful?

Solution

Yes, opening a file in text mode potentially can result in different data being read as newlines are translated for you from the platform native format to \n. Thus, files containing \r\n will give you a different checksum when read on Windows vs. a POSIX platform.

Open files in binary mode instead:

with open(file_loc, 'rb') as file_to_read:

Note that the same applies when writing a file. If you receive data from a POSIX system using \n line endings, and you write this to a file opened for writing in text mode on Windows, you'll end up with \r\n line endings in the written file.

If you are using Python 3, you are complicating matters some more. When you are opening files in text mode, you are translating the data from encoded bytes to decoded Unicode values. What codec is used for that can also differ from OS to OS, and even from machine to machine. The default is locale-defined (using locale.getpreferredencoding(False)), and as long as the data is decodable by the default locale, you can get very different results from reading a file using a different codec. You really want to ensure you use the same codec by setting it explicitly, or better still, open files in binary mode.

Since hashlib requires you to feed it byte strings, this is less of a problem when trying to calculate the digest (you'd have run into that problem and at least have to think about codecs there), but this applies to file transfers too; writing to text file will encode the data to the default codec.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top