ETL as a transaction

https://stackoverflow.com/questions/1214312

06-07-2019
|

Question

For all the ETLs I have written so far, I have never made them transactions - i.e. if table 4 fails, roll everything back.

What is the best practice in this regard?

To "BeginTran + Commit" or not to "BeginTran + Commit"

EDIT: I have one master package calling 4 other packages - is it possible to roll them all up into one transaction?

Solution

begin+commit in manageable batch sizes. You don't want to wrap a 6 hours import into a single transaction every night. Keep your batches at a size that can finish in 2-3 minutes at most. That you will hit data purity issues that will fail an ETL is a given, so at least reduce the impact to something manageable (ie. don't trigger a rollback that will last another 6 hours to complete).

OTHER TIPS

In SSIS, I always Begin Trans + Commit. I want to make sure that I can re-run the package without issue (or having to find what rows actually got inserted) if it fails.

It just makes recovery and cleanup so much easier.

You are often moving too much data in ETL to use a SQL transaction (the log has to store ALL the data to roll back, remember). I prefer to design packages such that they can be re-run nondestructively. Ideally they should be set up so that if they die in mid-stream, you can just start them and they'll continue somewhere approximately where they left off. Sometimes there's a performance penalty for that, but I think it's worth it.

Technically you can roll packages up into a single transaction; practically, maybe not.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow