Batch DB2 Performance of COMMIT/ROLLBACK VS ROLLBACK TO SAVEPOINT

https://stackoverflow.com/questions/10071387

30-05-2021
|

Question

In a COBOL batch program what is better in performance terms?

With commit:

IF SW-NEW-TRANSACT
  EXEC SQL
      COMMIT
  END-EXEC
END-IF.
PERFORM SOMETHING
   THRU SOMETHING-EXIT.
IF SW-ERROR
  EXEC SQL
      ROLLBACK
  END-EXEC
END-IF.

With syncpoints:

IF SW-NEW-TRANSACT
  EXEC SQL
      SAVEPOINT NAMEPOINT ON ROLLBACK RETAIN CURSORS
  END-EXEC
END-IF.
PERFORM SOMETHING
   THRU SOMETHING-EXIT.
IF SW-ERROR
  EXEC SQL
      ROLLBACK TO SAVEPOINT NAMEPOINT
  END-EXEC
END-IF.

Solution

SAVEPOINTs and COMMITs are not interchangeable.

A process always has to COMMIT or ROLLBACK database work at some point. A COMMIT is taken between transactions (between complete units of work). COMMIT may be taken after each transaction or, as is common in batch processing, after some multiple number of transactions. A COMMIT should never be taken mid-transaction (this defeats the UNIT OF WORK concept).

A SAVEPOINT is generally taken and possibly released within a single unit of work. SAVEPOINTs should always be released upon completion of a unit of work. They are always released upon a COMMIT.

The purpose of a SAVEPOINT is to allow partial backout from within a unit of work. This is useful when a process begins with a sequence of common database inserts/updates followed by a process branch where some updates may be performed before it can be determined that the alternate process branch should have been executed. The SAVEPOINT allows backing out of the "blind alley" branch and then continuing on with the alternate branch while preserving the common "up front" work. Without a SAVEPOINT, backing out of a "blind alley" might have required extensive data buffering within the transaction (complex processing) or a ROLLBACK and re-do from the start of the transaction with some sort of flag indicating that the alternative process branch needs to be followed. All this leads to complex application logic. ROLLBACK TO SAVEPOINT has several advantages. It can preserve "up front" work, saving the cost of doing it over. It saves rolling back the entire transaction. Rollbacks can be more "expensive" than the original inserts/updates were and may span multiple transactions (depending on the commit frequency). Finally, process complexity is generally reduced when database work can be selectively "undone" through a ROLLBACK TO SAVEPOINT.

How might SAVEPOINT be used to improve the efficiency of your batch program? If your transactions employ self induced rollbacks to recover from "blind alley" processing, then SAVEPOINT can be a huge benefit. Similarly, if the internal processing logic is complicated by the need to avoid performing database updates for similar "backout" requirements, then SAVEPOINT can be used to refactor the process into something that is quite a bit simpler and probably more efficient. Outside of these factors, SAVEPOINT is not going to affect performance in a positive manner.

Some claim that having a high COMMIT frequency in a batch program reduces performance. Consequently, the lower the commit frequency the better the performance. Tuning COMMIT frequency is not trivial. The lower the commit frequency, the longer database resources are held and consequently, the greater the probability of inducing database timeouts. Suffering a database timeout generally causes a process to rollback. The rollback is a very expensive operation. ROLLBACKs are a big hit to the DBMS itself and your transaction needs to re-apply all of the updates a second time once it is restarted. Lowering commit frequency can end up costing you a lot more than it gains. BEWARE!

EDIT

Rule of thumb: Commits have a cost. Rollbacks have a higher cost.

Discounting rollbacks due to bad data, device failure and program abends (all of which should be rare), most rollbacks are caused by timeout due to resource contention among processes. Doing fewer commits increases db contention. Doing fewer commits may improve performance. The trick is to find where performance gained in not committing out weights the cost of rollbacks due to contention. There are a large number of factors that influence this - may of them dynamic. My overall advice is to look elsewhere to improve performance - tuning commit frequency (where timeouts are not the issue) is generally a low return investment.

Other more fruitful ways to improve batch preformance often involve:

improving paralleslism by load splitting and running multiple images of the same job
analyzing db/2 bind plans and optomizing access paths
profiling the behaviour of the batch programs and refactoring those parts consuming the most resources

OTHER TIPS

This isn't a performance issue at all.

You COMMIT when you finish a unit of work, whatever a unit of work means to your application. Usually, it means that you've processed a complete transaction. In the batch world, you'd take a commit after 1,000 to 2,000 transactions, so you don't spend all your time COMMITing. The number depends on how many transactions you can rerun in the event of a ROLLBACK.

You ROLLBACK when you've encountered an error of some sort, either a database error or an application error.

You SAVEPOINT when you are processing a complex unit of work, and you want to save what you've done without taking a full COMMIT. In other words, you would take one or more SAVEPOINTs and then finally take a COMMIT.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow