Frage

I want to read the contents of a zip file into memory rather than extracting them to disc, find a particular file in the archive, open the file and extract a line from it.

Can a StringIO instance be opened and parsed? Suggestions? Thanks in advance.

zfile = ZipFile('name.zip', 'r')

    for name in zfile.namelist():
        if fnmatch.fnmatch(name, '*_readme.xml'):
            name = StringIO.StringIO()
            print name # prints StringIO instances
            open(name, 'r')  # IO Error: No such file or directory...

I found a few similar posts, but none that seem to address this issue: Extracting a zipfile to memory?

War es hilfreich?

Lösung 4

Thank you to everyone that contributed solutions. This is what ended up working for me:

zfile = ZipFile('name.zip', 'r')

        for name in zfile.namelist():
            if fnmatch.fnmatch(name, '*_readme.xml'):
                zopen = zfile.open(name)
                for line in zopen:
                    if re.match('(.*)<foo>(.*)</foo>(.*)', line):
                        print line

Andere Tipps

IMO just using read is enough:

zfile = ZipFile('name.zip', 'r')
files = []
for name in zfile.namelist():
  if fnmatch.fnmatch(name, '*_readme.xml'):
    files.append(zfile.read(name))

This will make a list with contents of files that match the pattern.

Test: You can then parse contents afterwards by iterating through the list:

for file in files:
  print(file[0:min(35,len(file))].decode()) # "parsing"

Or better use a functor:

import zipfile as zip
import os
import fnmatch

zip_name = os.sys.argv[1]
zfile = zip.ZipFile(zip_name, 'r')

def parse(contents, member_name = ""):
  if len(member_name) > 0:
    print( "Parsed `{}`:".format(member_name) )  
  print(contents[0:min(35, len(contents))].decode()) # "parsing"

for name in zfile.namelist():
  if fnmatch.fnmatch(name, '*.cpp'):
    parse(zfile.read(name), name)

This way there is no data kept in memory for no reason and memory foot print is smaller. It might be important if the files are big.

Don't overthink it. It Just Works:

import zipfile

# 1) I want to read the contents of a zip file ...
with zipfile.ZipFile('A-Zip-File.zip') as zipper:
  # 2) ... find a particular file in the archive, open the file ...
  with zipper.open('A-Particular-File.txt') as fp:
    # 3) ... and extract a line from it.
    first_line = fp.readline()

print first_line

The question you link shows you that you need to read the file. Depending on your use case that may already be enough. In your code you replace the loop variable holding a filename with an empty string buffer. Try something like this:

zfile = ZipFile('name.zip', 'r')

for name in zfile.namelist():
    if fnmatch.fnmatch(name, '*_readme.xml'):
        ex_file = zfile.open(name) # this is a file like object
        content = ex_file.read() # now file-contents are a single string

If you really want a buffer that you can manipulate, then simply instantiate it with the contents:

buf = StringIO(zfile.open(name).read())

You may also want to look at BytesIO and note that there are differences between Python 2 and 3.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top