Domanda

I have an external xml file download that needs unzipped and parsed. I have downloaded and unzipped it but now it is stuck as an Zip::Entry object and I am unable to parse it with Nokogiri.

require 'open-uri'
require 'zip'
require 'nokogiri'

url = 'https://download.api.bingads.microsoft.com/ReportDownload/Download.aspx?xmlfile'
zip_file = open(url)
# file pulled down successfully => tmp/localpath

unzippedxml = Zip::File.open(zip_file.path) do |z|
  xml_file = z.first
end
#output is my xml file => myxml.xml

unzippedxml.class => Zip::Entry

Nokogiri::XML("unzippedxml")
=> #<Nokogiri::XML::Document:0x212b2c0 name="document")

How do I parse this file? I've created a dummy xml file that didn't need unzipped and I've been able to parse it in the console but I am unable to get this one open.

Any help would be greatly appreciated!

È stato utile?

Soluzione

Zip::ZipFile represents the entire Zip container; what you need instead is inside this container, an object of class Zip::ZipEntry. You could for example use Zip::ZipFile.read to get a file with a specific name:

require 'zip/zip'

zip = Zip::ZipFile.open('some.zip')                 # open zip
xml_source = zip.read('filename_inside_zip.xml')    # read file contents

# now use the contents of xml_source with Nokogiri

Or, if you don't know the name but there's always only one file in the Zip, you can just take the first one:

require 'zip/zip'

zip = Zip::ZipFile.open('some.zip')                 # open zip
entry = zip.entries.reject(&:directory?).first      # take first non-directory
xml_source = entry.get_input_stream{|is| is.read }  # read file contents

# now use the contents of xml_source with Nokogiri
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top