Question

I have been using pylzma for a little bit, but I have to be able to create files compatible with the 7zip windows application. The caveat is that some of my files are really large (3 to 4gb created by a third party software in a proprietary binary format).

I went over and over here and on the instructions here: https://github.com/fancycode/pylzma/blob/master/doc/USAGE.md

I am able to create compatible files with the following code:

def Compacts(folder,f):
  os.chdir(folder)
  fsize=os.stat(f).st_size
  t=time.clock()
  i = open(f, 'rb')
  o = open(f+'.7z', 'wb')
  i.seek(0)

  s = pylzma.compressfile(i)
  result = s.read(5)
  result += struct.pack('<Q', fsize)
  s=result+s.read()
  o.write(s)
  o.flush()
  o.close()
  i.close()
  os.remove(f)

The smaller files (up to 2Gb) compress well with this code and are compatible with 7Zip, but the larger files just crash python after some time.

According to the user guide, to compact large files one should use streaming, but then the resulting file is not compatible with 7zip, as in the snippet bellow.

def Compacts(folder,f):
  os.chdir(folder)
  fsize=os.stat(f).st_size
  t=time.clock()
  i = open(f, 'rb')
  o = open(f+'.7z', 'wb')
  i.seek(0)

  s = pylzma.compressfile(i)
  while True:
    tmp = s.read(1)
    if not tmp: break
    o.write(tmp)
  o.flush()
  o.close()
  i.close()
  os.remove(f)

Any ideas on how can I incorporate the streaming technique present in pylzma while keeping the 7zip compatibility?

Was it helpful?

Solution

You still need to correctly write the header (.read(5)) and size, e.g. like so:

import os
import struct

import pylzma

def sevenzip(infile, outfile):
    size = os.stat(infile).st_size
    with open(infile, "rb") as ip, open(outfile, "wb") as op:
        s = pylzma.compressfile(ip)
        op.write(s.read(5))
        op.write(struct.pack('<Q', size))
        while True:
            # Read 128K chunks.
            # Not sure if this has to be 1 instead to trigger streaming in pylzma...
            tmp = s.read(1<<17)
            if not tmp:
                break
            op.write(tmp)

if __name__ == "__main__":
    import sys
    try:
        _, infile, outfile = sys.argv
    except:
        infile, outfile = __file__, __file__ + u".7z"

    sevenzip(infile, outfile)
    print("compressed {} to {}".format(infile, outfile))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top