Transaction log: small transactions vs large transactions for speed [closed]

https://dba.stackexchange.com/questions/285068

15-03-2021
|

Question

I am trying to learn how the transaction log works and have been studying it for a couple of day.

Some operations become faster when done as many small transactions, e.g. deleting many rows in small batches. See Methods of speeding up a huge DELETE FROM <table> with no clauses

Other operations, e.g. cursors become faster when one big explicit transaction is wrapped around the cursor. This seems contradictory to me...

Can someone please explain?

Edit with extra info. My table is rowstore with a clustred index and a NC index for the predicate in the delete.

Solution

If you are using a recent version of SQL Server you can compare session waits per sys.dm_exec_session_wait_stats in the small batch vs one delete comparison. Enabling Query Store and comparing metrics such as logical reads, physical reads, and CPU time among the small batch deletes as well as to the single delete will also be instructive.

Without additional detail, a more thorough answer to your question requires so much speculation it wouldn't be too useful. For example, the question as stated doesn't specify if the table rowstore or columnstore, is partitioned or not, whether the table has a primary key or clustered index if it is a rowstore table, how the delete is broken into small batches, or how the size was chosen to be "small" (was it just a guess or one of many candidates which was tested?).

Aaron Bertrand gives some careful consideration to many of these factors in the following blog post - in which even he seems surprised that storage subsystem characteristics can make a single delete faster than many small deletes in some cases.

Fastest way to Delete Large Number of Records in SQL Server 2019 December 3 https://www.mssqltips.com/sqlservertip/6238/fastest-way-to-delete-large-number-of-records-in-sql-server/

Here's a possible scenario that could explain the difference in performance that you are seeing - but there are many other possible scenarios as well.

It is quite possible that the transaction log is already of sufficient size that it need not grow to log the many small, non-overlapping transactions of a single session performing small batched deletes. At the same time, a single large delete may very well require the transaction log to grow many times to accomodate logging its activity. The transaction log is not allowed to take advantage of Instant File Initialization - if the transaction log auto-grows, it must write to the entire new growth area before the write is completed. For that reason, many small transaction log auto-growths to support a single large transaction could exacerbate writelog waits and substantially increase the duration of a single large delete.

OTHER TIPS

As with most things, it depends...

The longer a transaction is open against a set of Tables, the longer those Tables are locked for. Locking on a Table causes contention against other queries trying to read from and write to that Table. So by breaking up a transaction into multiple smaller transactions reduces the continuous runtime that those Tables are locked for. It also may better isolate only locking the Tables that need to be absolutely locked for that part of the code.

But a lot of relational problems are best solved in a relational manner (as opposed to iteratively or procedurally), and there is other downtime in creating and committing (or rolling back) multiple transactions and taking out the appropriate locks for those transactions that sometimes a single transaction is best for minimizing.

For example, if you're trying to update 100 rows in a Table, in most cases updating all 100 rows in a single transaction / single UPDATE statement will be a lot quicker than taking out 100 transactions and iteratively looping through the table 100 times, row by row.

Generally the tradeoff is, the query you're trying to run will run the quickest itself with the least amount of transactions, but will potentially cause the most amount of contention on the Tables involved in that query for the rest of the server / other queries accessing those Tables. By breaking the query / logic up into multiple transactions you likely reduce the overall contention for those Tables for the rest of the server but may incur some performance reduction in that specific query that was broken up into multiple transactions.

So the decision to be made will be based off of how important it is for that specific query to run quickly vs the busyness on the rest of the server involving the same Tables that query uses, and how long they can afford to sit waiting to perform their own operations.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange