Python - shutil seems to be not making a true binary copy of a file
Pregunta
Following the excellent advice of a poster yesterday I started using the shutil.copyfileobj
method to make a copy of a file.
My program should make an exact copy of the file, remove the last byte and save the new copy.
I tested it last night with some very small ASCII text files so I could check it was doing what I asked it too, I have tried it this morning on some actual 'complex' files, a PDF and a JPG and it looks like the copy function is not making a true copy. I looked at the resulting files in a hex editor, and I can see that after ~ offset 0x300 there is something odd going - either data is being added, or data is being changed on copy. I can not tell which.
My program iteratively takes off a byte and saves a new version, and I can see that the newly created files are consistently different to original file, (with the exception of the last byte)
def doNibbleAndSave(srcfile,fileStripped,strippedExt,newpath):
counter = '%(interationCounter)03d' % {"interationCounter":interationCounter} #creates the filename counter lable
destfile = newpath + "\\" + fileStripped + "_" + counter + strippedExt #creates the new filename
with open(srcfile, 'r') as fsrc:
with open(destfile, 'w+') as fdest:
shutil.copyfileobj(fsrc, fdest)
fdest.seek(nibbleSize, os.SEEK_END) #sets the number of bytes to be removed
fdest.truncate()
srcfile = destfile #makes the iterator pick up the newly 'nibbled' file to work on next
return (srcfile)
I can also see that the newly created objects are significantly smaller than the source file.
Solución
As you already noticed, you should open the files in binary mode; open(srcfile, "rb")
and open(destfile, "wb+")
. Otherwise, Python will assume the files are text-files and may do newline conversion, depending on the platform (see the tutorial for details).