The original, primary use for savepoints is to be able to roll back parts of a transaction.
Say you wanted to accept a large number of log entries, but need to process these in batches into the database:
for batch in per_batch(log_entries):
sp = transaction.savepoint()
try:
process_batch(batch)
except BatchFailedException:
sp.rollback()
transaction.commit()
raise
Now the transaction has been committed, except the last batch has been rolled back.
That was the original reason to use savepoints. Setting a savepoint has the side-effect of triggering a ZODB cache garbagecollection run.
The ZODB holds a cache of objects recently accessed. This includes objects that don't actually change during the current transaction; you just retrieved them from the database, used their data, and then stopped directly referencing them. The ZODB stores an object graph; one object references other objects, which in turn reference other objects. Each of those objects, if they inherit from the Persistent
base class, are separate ZODB records. When you traverse the graph, these objects are all loaded into memory.
The GC run clears them from memory again, provided they haven't changed. Traversing the object graph again would load them into memory again, but clearing them during a savepoint saves memory.
Savepoint data itself is stored on disk in a TmpStorage
file, in your TEMP
directory. This uses a tempfile.TemporaryFile()
object, which for security reasons is created in an unlinked state; the file exists, but the directory entry is cleared immediately on creation. You therefor cannot see this file from outside the ZODB process.
A full commit moves the data into the actual ZODB database and finalises the transaction.