Question

I am developing an application in which I am iterating over many (1,000,000+) rows in a table while inserting new rows and updating existing rows along the way. It is a requirement that the select statement yields every row in the table (those that exist when the select is initially executed) exactly once, and never yield rows that are inserted after the select is executed. I would prefer not to load all rows into memory (that takes a long time and a lot of RAM—I tried it).

I have developed a small Python example that demonstrates that SQLite apparently does not isolate inserts (and presumably updates and deletes) from a long-running select. I was not able to find any place in the SQLite documentation that specifically mentions this behavior, but I found several links that alluded to the fact that the insert would fail (possibly in earlier versions of SQLite?), which it does not in my example.

import sqlite3

def select_affected_by_insert():
    # select from and simultaneously modify same table
    cn = sqlite3.connect(':memory:')
    cn.execute("CREATE TABLE demo (v INTEGER PRIMARY KEY)")

    n = 5
    values = [[v] for v in range(n)]
    cn.executemany('INSERT INTO demo VALUES (?)', values)

    for (v,) in cn.execute('SELECT v FROM demo'):

        with cn:
            # insert in transaction
            cn.execute('INSERT INTO demo VALUES (?)', [n + v])

        print v, n + v
        assert v < n, 'got more rows than expected!'

if __name__ == '__main__':
    select_affected_by_insert()

SQLite 3.6.12
Python 2.6.4

Is there a better way to work around this than copying the data to a separate (temporary) table and select from there?

Clarification: I neglected to say that I need to do commits inside the loop. The process may be interrupted, and partially done work must be committed so it does not need to be redone on the next run-through.

Was it helpful?

Solution

  1. Use WAL mode (so the writer and reader do not interfere)
  2. Use separate connections for the reader and writer

OTHER TIPS

if you add open your database in a deferred transaction mode and COMMIT at the end of your SELECT-INSERT logic, like so:

cn = sqlite3.connect(':memory:', isolation_level='DEFERRED')
...
for (v,) in cn.execute('SELECT v FROM demo'):
    cn.execute('INSERT INTO demo VALUES (?)', [n + v])
cn.commit()

Your insert statements should be deferred until the end of the block. From the SQLite Docs for Transaction Control:

If multiple commands are being executed against the same SQLite database connection at the same time, the autocommit is deferred until the very last command completes. For example, if a SELECT statement is being executed, the execution of the command will pause as each row of the result is returned. During this pause other INSERT, UPDATE, or DELETE commands can be executed against other tables in the database. But none of these changes will commit until the original SELECT statement finishes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top