Question

How can I unzip a .zip file with Python into some directory output_dir and fetch a list of all the directories made by the unzipping as a result? For example, if I have:

unzip('myzip.zip', 'outdir')

outdir is a directory that might have other files/directories in it. When I unzip myzip.zip into it, I'd like unzip to return all the directories made in outdir/ as a result of the zipping. Here is my code so far:

import zipfile
def unzip(zip_file, outdir):
    """
    Unzip a given 'zip_file' into the output directory 'outdir'.
    """
    zf = zipfile.ZipFile(zip_file, "r")
    zf.extractall(outdir)

How can I make unzip return the dirs it creates in outdir? thanks.

Edit: the solution that makes most sense to me is to get ONLY the top-level directories in the zip file and then recursively walk through them which will guarantee that I get all the files made by the zip. Is this possible? The system specific behavior of namelist makes it virtually impossible to rely on

Was it helpful?

Solution

You can read the contents of the zip file with the namelist() method. Directories will have a trailing path separator:

>>> import zipfile
>>> zip = zipfile.ZipFile('test.zip')
>>> zip.namelist()
['dir2/', 'file1']

You can do this before or after extracting contents.

Depending on your operating environment, the result of namelist() may be limited to the top-level paths of the zip archive (e.g. Python on Linux) or may cover the full contents of the archive (e.g. IronPython on Windows).

The namelist() returns a complete listing of the zip archive contents, with directories marked with a trailing path separator. For instance, a zip archive of the following file structure:

./file1
./dir2
./dir2/dir21
./dir3
./dir3/file3
./dir3/dir31
./dir3/dir31/file31

results in the following list being returned by zipfile.ZipFile.namelist():

[ 'file1', 
  'dir2/', 
  'dir2/dir21/', 
  'dir3/', 
  'dir3/file3', 
  'dir3/dir31/', 
  'dir3/dir31/file31' ]

OTHER TIPS

ZipFile.namelist will return a list of the names of the items in an archive. However, these names will only be the full names of files including their directory path. (A zip file can only contain files, not directories, so directories are implied by archive member names.) To determine the directories created, you need a list of every directory created implicitly by each file.

The dirs_in_zip() function below will do this and collect all dir names into a set.

import zipfile
import os

def parent_dirs(pathname, subdirs=None):
    """Return a set of all individual directories contained in a pathname

    For example, if 'a/b/c.ext' is the path to the file 'c.ext':
    a/b/c.ext -> set(['a','a/b'])
    """
    if subdirs is None:
        subdirs = set()
    parent = os.path.dirname(pathname)
    if parent:
        subdirs.add(parent)
        parent_dirs(parent, subdirs)
    return subdirs


def dirs_in_zip(zf):
    """Return a list of directories that would be created by the ZipFile zf"""
    alldirs = set()
    for fn in zf.namelist():
        alldirs.update(parent_dirs(fn))
    return alldirs


zf = zipfile.ZipFile(zipfilename, 'r')

print(dirs_in_zip(zf))

Let it finish and then read the content of the directory - here is a good example of this.

Assuming no one else will be writing the target directory at the same time, walk the directory recursively prior to unzipping, then afterwards, and compare the results.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top