Text::CSV_XS is extremely fast, using that to handle the CSV should take care of that side of the performance problem.
There should be no need for special bulk insert code to make DBD::SQLite performant. An insert statement with bind parameters is very fast. The main trick is to turn off AutoCommit in DBI and do all the inserts in a single transaction.
use v5.10;
use strict;
use warnings;
use autodie;
use Text::CSV_XS;
use DBI;
my $dbh = DBI->connect(
"dbi:SQLite:dbname=csvtest.sqlite", "", "",
{
RaiseError => 1, AutoCommit => 0
}
);
$dbh->do("DROP TABLE IF EXISTS test");
$dbh->do(<<'SQL');
CREATE TABLE test (
name VARCHAR,
num1 INT,
num2 INT,
thing VARCHAR,
num3 INT,
stuff VARCHAR
)
SQL
# Using bind parameters avoids having to recompile the statement every time
my $sth = $dbh->prepare(<<'SQL');
INSERT INTO test
(name, num1, num2, thing, num3, stuff)
VALUES (?, ?, ?, ?, ?, ? )
SQL
my $csv = Text::CSV_XS->new or die;
open my $fh, "<", "test.csv";
while(my $row = $csv->getline($fh)) {
$sth->execute(@$row);
}
$csv->eof;
close $fh;
$sth->finish;
$dbh->commit;
This ran through a 5.7M CSV file in 1.5 seconds on my Macbook. The file was filled with 70,000 lines of...
"foo",23,42,"waelkadjflkajdlfj aldkfjal dfjl",99,"wakljdlakfjl adfkjlakdjflakjdlfkj"
It might be possible to make it a little faster using bind columns, but in my testing it slowed things down.