Question

I would like to modify some data with INSERTs and UPDATEs. From the psycopg tutorials it looks like I need

cur = connection.cursor()
cur.execute(my_insert_statement)
connection.commit()

Psycopg's cursor class seems to have little to do with the cursors, as defined by postgres.

If I modularize my script, creating a connection in the main module and some worker functions (no threading, just for modularization) should I

  1. pass the connection parameter to the functions and recreate cursor every time. Is there significant overhead creating new cursor object frequently?

    def process_log_file(self, connection):
    
  2. pass both connection and cursor - makes function signatures and implementation needlessly complicated

    def process_log_file(self, connection, cursor):
    
  3. pass only cursor as parameter and use mycursor.connection.commit() for commiting

    def process_log_file(self, cursor):
    
Was it helpful?

Solution

Any of the three will work (it is mainly a matter of personal taste) but I better like (1). Here is why:

The cursor type is light-weight and just creating it doesn't do anything special apart from creating a new Python object. You're welcome to create, use (commit/rollback) and destroy as many cursor as you like, especially if that helps you keep the code clean and organized.

Also, cursors are important when you're working with complex logic that need access to data coming from multiple, different queries: in this case the cursor acts as a holder/iterator for your data.

In the end, passing around the connection (your real handle to the backend) and keeping the cursors local to the specific function/method just "feels right".

OTHER TIPS

cursors support the with usage pattern, which will automatically close them once the block has completed. That can be a very useful pattern for when you are doing compact operations with the cursors.

At other times, cursors may need to be used throughout a function, or multiple cursors may need to be used, and so in that case the with pattern would make less sense, and it would be best declared at function-level scope.

Also keep in mind the importance of named cursors, which is where psycopg cursors and Postgres cursors intertwine. Simply by giving the name attribute a value in the constructor call, you will get a server-side cursor automatically which can then be iterated over just as any Python collection would, and which performs chunked fetches.

The chunk size can be altered, although it fetches in blocks of 2000 by default. This is particularly important when querying large tables, as you can quickly run out of memory client side with a huge result set. psycopg abstracts having to deal with Postgres cursors directly, and the next chunk is fetched transparently during the iteration of the cursor when needed.

Keep in mind that a named cursor can really only be used for one thing -- a query that you then iterate over; if you try to execute another query on the same cursor it will, if memory serves, throw an exception. With non-named cursors you can re-use the same cursor across executes once you're done with the results.

I generally use named cursors for any queries that I think might even have a chance of returning a fairly large result set, and non-named cursors for small queries and other commands such as updates, deletes, table creates, etc.

The Python DB API Specification says:

"a database cursor … is used to manage the context of a fetch operation."

So looking at modularity, it seems that unless your function needs the results of a previous operation, it would make more sense to create new cursors and either close them, or let them be closed on their own when they leave scope. If you have an operation that you repeat many times, and are sure that recreating the cursor will introduce overhead, you could always create a helper class that wraps a cursor instead of a simple helper function.

However, all three of your methods should work fine. I've personally written code using style #2, though I agree that it seems to be the worst of them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top