The problem is that you have a corrupted zip file. I can add more details about the corruption below, but first the practical stuff:
You can use this code snippet to tell you which member within the archive is corrupted. However, print z.testzip()
would already tell you the same thing. And zip -T
or unzip
on the command line should also give you that info with the appropriate verbosity.
So, what do you do about it?
Well, obviously, if you can get an uncorrupted copy of the file, do that.
If not, if you want to just skip over the bad file and extract everything else, that's pretty easy—mostly the same code as the snippet linked above:
with open(sys.argv[1], 'rb') as zf:
z = zipfile.ZipFile(zf, allowZip64=True)
for member in z.infolist():
try:
z.extract(member)
except zipfile.error as e:
# log the error, the member.filename, whatever
The Bad magic number for file header
exception message means that zipfile
was able to successfully open the zipfile, parse its directory, find the information for a member, seek to that member within the archive, and read the header of that member—all of which means you probably have no zip64-related problems here. However, when it read that header, it did not have the expected "magic" signature of PK\003\004
. That means the archive is corrupted.
The fact that the corruption happens to be at exactly 4294967296 implies very strongly that you had a 64-bit problem somewhere along the chain, because that's exactly 2**32.
The command-line zip
/unzip
tool has some workarounds to deal with common causes of corruption that lead to problems like this. it looks like those workarounds may be working for this archive, given that you get a warning, but all of the files are apparently recovered. Python's zipfile
library does not have those workarounds, and I doubt you want to write your own zip
-handling code yourself…
However, that does open the door for two more possibilities:
First, zip
might be able to repair the zipfile for you, using the -F
of -FF
flag. (Read the manpage, or zip -h
, or ask at a site like SuperUser if you need help with that.)
And if all else fails, you can run the unzip
tool from Python, instead of using zipfile
, like this:
subprocess.check_output(['unzip', fname])
That gives you a lot less flexibility and power than the zipfile
module, of course—but you're not using any of that flexibility anyway; you're just calling extractall
.