Question

There is a program that generates huge CSV files. For example:

arr = (0..10).to_a
CSV.open("foo.csv", "wb") do |csv|
  (2**16).times { csv << arr }
end

It will generate a big file, so I want to be compressed on-the-fly, and, instead of output a non-compressed CSV file (foo.csv), output a bzip-compressed CSV file (foo.csv.bzip).

I have an example from the "ruby-bzip2" gem:

writer = Bzip2::Writer.new File.open('file')
writer << 'data1'
writer.close

I am not sure how to compose Bzip2 write from the CSV one.

Was it helpful?

Solution

You can also construct a CSV object with an IO or something sufficiently like an IO, such as a Bzip2::Writer.

For example

File.open('file.bz2', 'wb') do |f|
  writer = Bzip2::Writer.new f
  CSV(writer) do |csv|
    (2**16).times { csv << arr }
  end
  writer.close
end

OTHER TIPS

Maybe it would be more flexible to write the CSV data to stdout:

# csv.rb
require 'csv'
$stdout.sync = true

arr = (0..10).to_a
(2**16).times do
  puts arr.to_csv
end

... and pipe the output to bzip2:

$ ruby csv.rb | bzip2 > foo.csv.bz2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top