Python - shutil seems to be not making a true binary copy of a file

https://stackoverflow.com/questions/10627099

09-06-2021
|

Pregunta

Following the excellent advice of a poster yesterday I started using the shutil.copyfileobj method to make a copy of a file.

My program should make an exact copy of the file, remove the last byte and save the new copy.

I tested it last night with some very small ASCII text files so I could check it was doing what I asked it too, I have tried it this morning on some actual 'complex' files, a PDF and a JPG and it looks like the copy function is not making a true copy. I looked at the resulting files in a hex editor, and I can see that after ~ offset 0x300 there is something odd going - either data is being added, or data is being changed on copy. I can not tell which.

My program iteratively takes off a byte and saves a new version, and I can see that the newly created files are consistently different to original file, (with the exception of the last byte)

def doNibbleAndSave(srcfile,fileStripped,strippedExt,newpath):
 counter = '%(interationCounter)03d' % {"interationCounter":interationCounter} #creates the filename counter lable
 destfile = newpath + "\\" + fileStripped + "_" + counter + strippedExt #creates the new filename 
 with open(srcfile, 'r') as fsrc:
  with open(destfile, 'w+') as fdest:
   shutil.copyfileobj(fsrc, fdest)
   fdest.seek(nibbleSize, os.SEEK_END) #sets the number of bytes to be removed
   fdest.truncate()
 srcfile = destfile #makes the iterator pick up the newly 'nibbled' file to work on next
 return (srcfile)

I can also see that the newly created objects are significantly smaller than the source file.

Solución

As you already noticed, you should open the files in binary mode; open(srcfile, "rb") and open(destfile, "wb+"). Otherwise, Python will assume the files are text-files and may do newline conversion, depending on the platform (see the tutorial for details).

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow