Question

Within SQL Server Integration Services (SSIS) there is the ability to setup a connection to a flat file that can hold millions of records and have that data pushed to a SQL DB. Furthermore, this process can be called from a C# app by referencing and using the Microsoft.SqlServer.Dts.Runtime namespace.

Would a flat file with millions of records best be ran with SSIS, or would the collective "you" prefer a c# app with multiple worker threads(one to read and add the row to variable, one to write from that variable to the DB), and a "mother" class that manages those threads? (the dev box has two cpu's)

I have seen this data (sql team blog) stating that for a flat file with a million lines, SSIS is the fastest:

Process                Duration (ms)
--------------------   -------------
SSIS - FastParse ON         7322 ms 
SSIS - FastParse OFF        8387 ms 
Bulk Insert                10534 ms 
OpenRowset                 10687 ms 
BCP                        14922 ms

What are your thoughts?

Was it helpful?

Solution

I can only speak for myself and my experience. I would go with SSIS, since this is one of those cases where you might be re-inventing the wheel unnecessarily. This is a repetitive task that has already been solved by SSIS.

I have about 57 jobs (combination of DTS and SSIS) that I manage on a daily basis. Four of those routinely handle exporting between 5 to 100 million records. The database I manage has about 2 billion rows. I made use of a script task to append the date, down to the millisecond, so that I can run jobs several times a day. Been doing that for about 22 months now. It's been great!

SSIS jobs can also be scheduled. So you can set it and forget it. I do monitor everything every day, but the file handling part has never broken down.

The only time I had to resort to a custom C# program, was when I needed to split the very large files into smaller chunks. SSIS is dog slow for that sort of stuff. A one gig text file took about one hour to split, using the script task. The C# custom program handled that in 12 minutes.

In the end, just use what you feel comfortable using.

OTHER TIPS

SSIS is incredibly fast. In addition, if it's something that needs to occur repeatedly, you can setup an agent to fire it off on schedule. Writing it yourself is one thing, trying to make it multithreaded gets a lot more complicated than it appears at first.

I'd recommend SSIS 9 times out of ten.

I can't see how using multiple threads would help performance in this case. When transferring large volumes of data, the main bottleneck is usually disk I/O. Spawning multiple threads wouldn't solve this issue, and my guess would be that it would make things worse since it would introduce locking contention between the multiple processes hitting the database.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top