Question

How does SSDT handle batching of Data Flow tasks?

I am parsing a CSV that is about 1GB into SQL Server using SSDT.

In the Data Flow there is a Flat File source, that goes to an OLE DB destination (a staging table). Then an SP is executed using a SQL Task.

The CSV being parsed contains a summary table and a child table with a foreign key reference to the summary table. As such the CSV contains duplicated IDs of the summary table (so there is one line per child row). If a single summary ID were to be split across two batches, then I would lose data - the SP does something like "delete from child table where ID in staging, then re-insert from staging into child table". Previously we had to do this because the vendor wasn't exporting a unique ID for the child data. They are now, so I can use a merge statement.

But. I would still like to know if Data Flow tasks are batched, and if so, how?

Était-ce utile?

La solution

As mentioned in the comment under the question, an answer to this question is included this post: http://blogs.lobsterpot.com.au/2011/02/17/the-ssis-tuning-tip-that-everyone-misses/

That is, that batching is done by default (10k rows per batch).

Licencié sous: CC-BY-SA avec attribution
Non affilié à dba.stackexchange
scroll top