Question

I usually write my inserts like this:

query = "
  INSERT INTO
    SomeTable (Column1, Column2)
  VALUES
    #{data.map { |row| "(#{row[0]},#{row[1]})" }.join(?,)}
"

But I shudder to think about the massive strings that are being created when my data number in the hundreds of thousands. Is there a more proper way to do this? All of the string interpolation seems really smelly.

I'm using mysql2 version 0.3.15. Note that this is not Rails or Active Record related.

Was it helpful?

Solution

If you're loading a lot of data, you can create a CSV file, move it to the DB server and let the database manager load it directly. It'll be a LOT faster.

See "13.2.6 LOAD DATA INFILE Syntax" for more information.

OTHER TIPS

I have created the following benchmark to compare which concat method is better in your escenario. Looks like you are using the better one.

                           user     system      total        real
<<                     6.540000   0.000000   6.540000 (  6.541577)
+                      7.580000   0.000000   7.580000 (  7.590544)
#{one}#{two}           5.700000   0.000000   5.700000 (  5.703538)

require 'benchmark'

values = Array.new(1000){ |i| [i.to_s, (i+1).to_s] }

Benchmark.bm(20) do |x|
  x.report ('<<') do
    10_000.times do
            values.map { |row| "(" << row[0] << "," << row[1] << ")" }.join(?,)
    end
  end
  x.report('+') do
    10_000.times do
      values.map { |row| "(" + row[0] + "," + row[1] + ")" }.join(?,)
    end
  end
  x.report('#{one}#{two}') do
    10_000.times do
        values.map { |row| "(#{row[0]},#{row[1]})" }.join(?,)
    end
  end
end

Well part of the problem is that queries are sent as strings so no matter how you do it, you eventually need to make a string. For your specific example, I would suggest batching your queries. Just do a portion of your data in a loop to avoid constructing huge strings and also overly stressing your database.

I don't know enough about ruby string internals to tell you the best way to construct your string but with a good batch size (maybe 100 records at a time) you should be fine with what you have.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top