The SQLite documentation says (here) that you can avoid checkpoint pauses in WAL-mode by running the checkpoints on a separate thread. I tried this, and it doesn't appear to work: the '-wal
' file grows without bound, it is unclear whether anything is actually getting copied back into the main database file, and (most important) after the -wal
file has gotten big enough (over a gigabyte) the main thread starts having to wait for the checkpointer.
In my application the main thread continuously does something essentially equivalent to this, where generate_data
is going to spit out order of a million rows to be inserted:
db = sqlite3.connect("database.db")
cursor = db.cursor()
cursor.execute("PRAGMA wal_autocheckpoint = 0")
for datum in generate_data():
# It is a damned shame that there is no way to do this in one operation.
cursor.execute("SELECT id FROM strings WHERE str = ?", (datum.text,))
row = cursor.fetchone()
if row is not None:
id = row[0]
else:
cur.execute("INSERT INTO strings VALUES(NULL, ?)", (datum.text,))
id = cur.lastrowid
cursor.execute("INSERT INTO data VALUES (?, ?, ?)",
(id, datum.foo, datum.bar))
batch_size += 1
if batch_size > batch_limit:
db.commit()
batch_size = 0
and the checkpoint thread does this:
db = sqlite3.connect("database.db")
cursor = db.cursor()
cursor.execute("PRAGMA wal_autocheckpoint = 0")
while True:
time.sleep(10)
cursor.execute("PRAGMA wal_checkpoint(PASSIVE)")
(Being on separate threads, they have to have separate connections to the database, because pysqlite doesn't support sharing a connection among multiple threads.) Changing to a FULL or RESTART checkpoint does not help - then the checkpoints just fail.
How do I make this actually work? Desiderata are: 1) main thread never has to wait, 2) journal file does not grow without bound.