Question

I am working on a desktop application which uses SQLite to bulk insert tens of thousands of rows into a SQLite database. I would like help optimizing the bulk insert performance. It currently takes up to 50 seconds to insert 60 megs worth of data into the database.

  • what connection string paramaters could I use to improve performance? Should I change the buffer size? Is this possible via a connection string parameter? Are there any other connection string parameters to improve performance? My current connection string is:

    Data Source=Batch.db;Version=3;Pooling=True;Max Pool Size=10;Synchronous=off;FailIfMissing=True;Journal Mode=Off;

  • I am using Dapper ORM. (built by the guys at StackOverflow) Is there a faster way to bulk insert into Sqlite, in .net?

  • System.Data.Sqlite is being used to insert into SQLite. What about getting a special compiled version of sqlite which improves performance? Is one version of SQLite better than another? Currently using System.Data.SQLite from http://sqlite.phxsoftware.com

  • Currently, I am wrapping inserts inside a transaction to make them faster (this made a good improvement).

  • I am inserting into one table at a time into 17 tables. Could I parallelize this on different threads and make this faster?

Current Performance. Is this typical? Can I do better?

  • 55,000 rows into table with 19 columns: 2.25 sec to insert (24k inserts/sec)
  • 10,000 rows into table with 63 columns: 2.74 sec to insert (3.7k/sec)

I like SQLite, but I would love to make it a bit faster. Currently saving my objects to an XML file using XML serialization is faster than saving to a SQLite database, so my boss is asking: why switch to SQLite? Or should I be using MongoDB, or some other object database?

Was it helpful?

Solution

So I finally found the trick to high performance bulk inserts in SQLite using .NET. This trick improved insert performance by a factor of 4.1! My total save time went from 27 seconds to 6.6 seconds. wow!

This article explains the fastest way to do bulk inserts into SQLite. The key is reusing the same parameter objects but for each record to insert, assigning a different value. The time that .NET takes constructing all those DbParameter objects really adds up. For example with 100k rows and 30 columns = 3 million parameter objects which must be created. Instead, creating and reusing just 30 parameter objects is much faster.

New performance:

  • 55,000 rows (19 columns) in .53 seconds = 100k inserts/second

        internal const string PeakResultsInsert = @"INSERT INTO PeakResult values(@Id,@PeakID,@QuanPeakID,@ISTDRetentionTimeDiff)";
    
                var command = cnn.CreateCommand();
                command.CommandText = BatchConstants.PeakResultsInsert;
    
                string[] parameterNames = new[]
                                     {
                                       "@Id",
                                       "@PeakID",
                                       "@QuanPeakID",
                                       "@ISTDRetentionTimeDiff"
                                      };
    
                DbParameter[] parameters = parameterNames.Select(pn =>
                {
                    DbParameter parameter = command.CreateParameter();
                    parameter.ParameterName = pn;
                    command.Parameters.Add(parameter);
                    return parameter;
                }).ToArray();
    
                foreach (var peakResult in peakResults)
                {
                    parameters[0].Value = peakResult.Id;
                    parameters[1].Value = peakResult.PeakID;
                    parameters[2].Value = peakResult.QuanPeakID;
                    parameters[3].Value = peakResult.ISTDRetentionTimeDiff;
    
                    command.ExecuteNonQuery();
                }
    

It ends up that I could not use Dapper for inserting into my large tables. (For my small tables, I still use Dapper).

Note, some other things that I found:

  • I tried using multiple threads to insert data into the same database, this did not make any improvement. (didn't make a difference)

  • Upgraded from System.Data.Sqlite 1.0.69 to 1.0.79. (didn't make a difference in performance that I could see)

  • I am not assigning a Type to the DbParameter, it doesn't seem to make a performance difference either way.

  • For reads, I could not improve on Dapper's performance.

OTHER TIPS

Currently, I am wrapping inserts inside a transaction to make them faster (this made a good improvement).

The biggest gain I've seen in bulk insert speed was to break inserts into smaller chunks. How small of a chunk varies per platform/schema/etc, I'm sure. I believe during my tests it was near 1000 or so.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top