How to iterate through an in-memory zip file in Ruby

Question 1

See @bronson’s answer for a more up to date version of this answer using the newer RubyZip API.

The Rubyzip docs you linked to look a bit old. The latest release (0.9.9) can handle IO objects, so you can use a StringIO (with a little tweaking).

Even though the api will accept an IO, it still seems to assumes it’s a file and tries to call path on it, so first monkey patch StringIO to add a path method (it doesn’t need to actually do anything):

require 'stringio'
class StringIO
  def path
  end
end

Then you can do something like:

require 'zip/zip'
Zip::ZipInputStream.open_buffer(StringIO.new(last_response.body)) do |io|
  while (entry = io.get_next_entry)
    # deal with your zip contents here, e.g.
    puts "Contents of #{entry.name}: '#{io.read}'"
  end
end

and everything will be done in memory.

Question 2

Matt's answer is exactly right. Here it is updated to the new API:

Zip::InputStream.open(StringIO.new(input)) do |io|
  while entry = io.get_next_entry
    if entry.name == 'doc.kml'
      parse_kml(io.read)
    else
      raise "unknown entry in kmz file: #{entry.name}"
    end
  end
end

And there's no need to monkeypatch StringIO anymore. Progress!

Question 3

Zip::File.open_buffer(content) do |zip|
  zip.each do |entry|
    decompressed_data += entry.get_input_stream.read
  end
end

Question 4

With RubyZip version 1.2.1 (or maybe some previous versions too), we just need to use open_buffer method of Zip::File class.

From RubyZip documentation:

Like #open, but reads zip archive contents from a String or open IO stream, and outputs data to a buffer. (This can be used to extract data from a downloaded zip archive without first saving it to disk.)

Example:

Zip::File.open_buffer(last_response.body) do |zip|
  zip.each do |entry|
    puts entry.name
    # Do whatever you want with the content files.
  end
end

Question 5

You could use Tempfile to dump the zip file into a temporary file. Tempfile creates an operation-system specific temporary file which will be cleaned up by the OS after your program finishes.

Question 6

Inspired by Matt's answer I have a slightly modified solution for those who have to use 0.9.x rubyzip gem. Mine doesn't require a new class definition.

sio = StringIO.new(response.body)
sio.define_singleton_method(:path) {} #needed to create fake method path TO satisfy the ancient rubyzip 0.9.8 gem
Zip::ZipInputStream::open_buffer(sio) { |io|
    while (entry = io.get_next_entry)
        puts "Contents of #{entry.name}"
     end
}

Question 7

This worked for me. In my case I have only one file so I used a fixed path, but you can use entry.name to build your path.

input = HTTParty.get(link).body
Zip::File.open_buffer(input) do |zip_file|
    zip_file.each do |entry|
      entry.extract(path)
    end
end

Question 8

Just an update on this one due to changes at rubyzip:

Zip::InputStream.open(StringIO.new(zip_file)) do |io|
  while (entry = io.get_next_entry)
    # deal with your zip contents here, e.g.
    puts "Contents of #{entry.name}: '#{io.read}'"
  end
end