Question

When performing many inserts into a database I would usually have code like this:

using (var connection = new SqlConnection(connStr))
{
  connection.Open();
  foreach (var item in items)
  {
     var cmd = new SqlCommand("INSERT ...")
     cmd.ExecuteNonQuery();
  }
}

I now want to shard the database and therefore need to choose the connection string based on the item being inserted. This would make my code run more like this

foreach (var item in items)
{
  connStr = GetConnectionString(item);
  using (var connection = new SqlConnection(connStr))
  {
    connection.Open();      
    var cmd = new SqlCommand("INSERT ...")
    cmd.ExecuteNonQuery();
  }
}

Which basically means it's creating a new connection to the database for each item. Will this work or will recreating connections for each insert cause terrible overhead?

Was it helpful?

Solution

I assume you are talking about C#/.NET. In that case the connections are pooled by the framework, so the overhead of creating them that way isn't that high.

Edit

As pointed out by @TomTom transactions should also be taken into considerations. If you are doing inserts into different databases on the same server you can use a normal SQL transaction for that. If the databases are on different servers you would need to use an MSDTC transaction for coordinating them across the database servers. Anyways the best way to handle transactions are by wrapping the relevant code in a TransactionScope. This doesn't conflict with opening and closing (in reality reusing from a pool) database connections.

using(new TransactionScope())
{
    // Update code to various databases, opening and closing connections.
}

With SQL2005 or newer the TransactionScope will default to a SQL Transaction first and then automatically escalate it to an MSDTC transaction if needed.

OTHER TIPS

Theoretically it is ok to create as many connections as you wanr. Recreating a connection is fast, unless you force non-connection pooling. Per standard SQL connections are not closed but put into a pool (for two minutes, iirc) for reuse.

That said, if you open a new connection for every insert you run into a serious problems with - your transaction boundaries. More complex updates need to fall under one transaction. While yoyu CAN just wrap that under a System.Transaction namespace... ...it will mean that all the connections will stay open until commit, using up a LOT of them, and it is going to force MSDTC (Distributed Transaction Coordinator) to step in - with all the overhead it has.

As such, reusing connections is a lot more advisable from an architectural point of view You basically are on a dead end with your transactional behavior, not with the connection count.

In addition to the points made by the other posters the pattern you say is your usual style reminds me of something I just read.

http://research.microsoft.com/apps/pubs/?id=76507

2.3 Identifying Opportunities for Bulk Operations

Consider a loop in the application code, shown below, inside which the application is inserting data into a table:

for (int i=0;i<MAXVALUE;i++) {
// execute SQL statement INSERT INTO T VALUES(...)
}

As written above, the code is inefficient since each time through the loop an INSERT statement is executed. A much more efficient way to achieve the same result is to use the bulk insert APIs of the data access layer, which takes advantage of batching. Note that the ingredients for identifying this problem is having the application context that a particular SQL statement is executed repeatedly inside a loop, and the database context to know that each instance is in fact an INSERT statement on a table T. It is then possible to put these pieces of information together to suggest a mapping to the bulk insert APIs.

In ADO.NET 2.0 I think this means using SqlBulkCopy

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top