I am working through importing csv files through rake tasks (as per my question here the other day).

namespace :csvimport do

  desc "Import Products CSV Data."
  task :products => :environment do

    ActiveRecord::Migration.drop_table products

    require 'csv'
    csv_file_path = '/home/jay/workspace/db/import_tables/products.csv'
    CSV.foreach(csv_file_path) do |row|
      p = Product.create!({
          :product_id => row[0],
          :product_name => row[1],
        }
      )
    end
  end
end

This works great for small file sizes (say 10000 rows). But when I try with a larger file upwards of a million it is taking a very long time. I am also getting no feedback that the process is happening. If I go into pgAdmin3 and using SQL SELECT count(*) FROM sales; I can see it is going up 10 or 20 rows per second.

Does anyone have any suggestions on a better way to do this? I can just import the data directly through pgAdmin SQL which is very fast (a couple of minutes) but I want to build a solution so that when I go to production I can do this through an admin interface.

In saying this, once I go to production I will attempt to only import new data from one of our older systems when bringing it into rails.

Also, how can I just kill the existing rake task? Better way then just 'x' out of terminal.

有帮助吗?

解决方案

Use COPY.

The Ruby Pg gem supports this with the copy_data method of PG::Connection.

The gem comes with examples of how to use it.

To learn more about improving insertion rates in general, see: How to speed up insertion performance in PostgreSQL . In particular, see the advice re batching work into transactions.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top