unzipping a file with Python and returning all the directories it creates

Question 1

You can read the contents of the zip file with the namelist() method. Directories will have a trailing path separator:

>>> import zipfile
>>> zip = zipfile.ZipFile('test.zip')
>>> zip.namelist()
['dir2/', 'file1']

You can do this before or after extracting contents.

Depending on your operating environment, the result of namelist() may be limited to the top-level paths of the zip archive (e.g. Python on Linux) or may cover the full contents of the archive (e.g. IronPython on Windows).

The namelist() returns a complete listing of the zip archive contents, with directories marked with a trailing path separator. For instance, a zip archive of the following file structure:

./file1
./dir2
./dir2/dir21
./dir3
./dir3/file3
./dir3/dir31
./dir3/dir31/file31

results in the following list being returned by zipfile.ZipFile.namelist():

[ 'file1', 
  'dir2/', 
  'dir2/dir21/', 
  'dir3/', 
  'dir3/file3', 
  'dir3/dir31/', 
  'dir3/dir31/file31' ]

Question 2

ZipFile.namelist will return a list of the names of the items in an archive. However, these names will only be the full names of files including their directory path. (A zip file can only contain files, not directories, so directories are implied by archive member names.) To determine the directories created, you need a list of every directory created implicitly by each file.

The dirs_in_zip() function below will do this and collect all dir names into a set.

import zipfile
import os

def parent_dirs(pathname, subdirs=None):
    """Return a set of all individual directories contained in a pathname

    For example, if 'a/b/c.ext' is the path to the file 'c.ext':
    a/b/c.ext -> set(['a','a/b'])
    """
    if subdirs is None:
        subdirs = set()
    parent = os.path.dirname(pathname)
    if parent:
        subdirs.add(parent)
        parent_dirs(parent, subdirs)
    return subdirs


def dirs_in_zip(zf):
    """Return a list of directories that would be created by the ZipFile zf"""
    alldirs = set()
    for fn in zf.namelist():
        alldirs.update(parent_dirs(fn))
    return alldirs


zf = zipfile.ZipFile(zipfilename, 'r')

print(dirs_in_zip(zf))

Question 3

Let it finish and then read the content of the directory - here is a good example of this.

Question 4

Assuming no one else will be writing the target directory at the same time, walk the directory recursively prior to unzipping, then afterwards, and compare the results.