Question

How does SSDT handle batching of Data Flow tasks?

I am parsing a CSV that is about 1GB into SQL Server using SSDT.

In the Data Flow there is a Flat File source, that goes to an OLE DB destination (a staging table). Then an SP is executed using a SQL Task.

The CSV being parsed contains a summary table and a child table with a foreign key reference to the summary table. As such the CSV contains duplicated IDs of the summary table (so there is one line per child row). If a single summary ID were to be split across two batches, then I would lose data - the SP does something like "delete from child table where ID in staging, then re-insert from staging into child table". Previously we had to do this because the vendor wasn't exporting a unique ID for the child data. They are now, so I can use a merge statement.

But. I would still like to know if Data Flow tasks are batched, and if so, how?

Was it helpful?

Solution

As mentioned in the comment under the question, an answer to this question is included this post: http://blogs.lobsterpot.com.au/2011/02/17/the-ssis-tuning-tip-that-everyone-misses/

That is, that batching is done by default (10k rows per batch).

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top