Pergunta

I am playing a bit with the python api for sqlite3, i have a little table for store languages with an id, name and creation_date fields. I am trying to map the raw query results into a namedtuple as the docs recommend, it that way i can manage rows in a more readable way, so here is my namedtuple.

LanguageRecord = namedtuple('LanguageRecord', 'id, name, creation_date')

The code that the docs suggest for the mapping is as follows:

for language in map(LanguageRecord._make, c.fetchall()):
  # do something with languages

This is fine when i want to return a collection of languages but in this case i want just to retrieve one language:

c.execute('SELECT * FROM language WHERE name=?', (name,))

So my first attempt it was something like this:

language = map(LanguageRecord._make, c.fetchone())

This code doesn't works because fetchone() returns a tuple instead a list with one tuple, so the map function tries to create three namedtuples one for each tuple field thought.

My first approach to solve this was to explicitly create a list and append to it the tuple result, something like:

languages = []
languages.append(c.fetchone())
for language in map(LanguageRecord._make, languages):
  # do something with language

My second approach was to use fetchall() although i just want one record. I can set the name field with a unique constrain in the database in order to garantize just one result.

for language in map(LanguageRecord._make, c.fetchall()):
  # do something with languages

Another approach could be use fetchall()[0] without the unique constrain to garantize just one result.

My question is which is the best and common way to deal with this problem, should i use always fetchall to maintain a common interface and let the database manage the uniqueness logic? or should i create a list explicitly as in approach 1? Is there a more easy way to accomplish this task?

Foi útil?

Solução

There is a much easier way! Sqlite3 provides a way for the user to define "row factories". These row factories take the cursor and the tuple row and can return whatever type of object it wants.

Once you set the row factory with

con.row_factory = my_row_factory

then rows returned by the cursor will be the result of my_row_factory applied to the tuple-row. For example,

import sqlite3
import collections

LanguageRecord = collections.namedtuple('LanguageRecord', 'id name creation_date')
def namedtuple_factory(cursor, row):
    return LanguageRecord(*row)

con = sqlite3.connect(":memory:")
con.row_factory = namedtuple_factory
cur = con.cursor()
cur.execute("select 1,2,3")
print(cur.fetchone())

yields

LanguageRecord(id=1, name=2, creation_date=3)

For another example of how to define a namedtuple factory, see this post.


By the way, if you set

conn.row_factory = sqlite3.Row

then rows are returned as dicts, whose keys are the table's column names. Thus, instead of accessing parts of the namedtuple with things like row.creation_date you could just use the builtin sqlite3.Row row factory and access the equivalent with row['creation_date'].

Outras dicas

An improved row_factory is actually this, which can be reused for all sorts of queries:

from collections import namedtuple

def namedtuple_factory(cursor, row):
    """Returns sqlite rows as named tuples."""
    fields = [col[0] for col in cursor.description]
    Row = namedtuple("Row", fields)
    return Row(*row)

conn = sqlite3.connect(":memory:")
conn.row_factory = namedtuple_factory
cur = con.cursor()

There is another one row_factory on the top of namedtuple:

from collection import namedtuple

def namedtuple_factory(cursor, row, cls=[None]):
    rf = cls[0]
    if rf is None:
        fields = [col[0] for col in cursor.description]
        cls[0] = namedtuple("Row", fields)
        return cls[0](*row)
    return rf(*row)

One can generalize further in order to use other class factories:

def make_row_factory(cls_factory, **kw):
    def row_factory(cursor, row, cls=[None]):
        rf = cls[0]
        if rf is None:
            fields = [col[0] for col in cursor.description]
            cls[0] = cls_factory("Row", fields, **kw)
            return cls[0](*row)
        return rf(*row)
    return row_factory

These factory functions are useful for cases when all query results have same fields.

Examples:

  1. namedtuple_factory = make_row_factory(namedtuple)

  2. import dataclass

    row_factory = make_row_factory(dataclass.make_dataclass)

  3. pip3 install recordclass

    import recordclass

    row_factory = make_row_factory(recordclass.make_dataclass, fast_new=True)

Here are some performance counters to compare different ways (debian linux, 64 bit, python 3.9).

Script for creation test database:

N = 1000000
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE test
             (id int, x real, y real, p int, q int)''')
gen = ((i, random(), random(), randint(0,N), randint(0,N)) for i in range(N))
c.executemany("INSERT INTO test VALUES (?,?,?,?,?)", gen)
conn.commit()
conn.close()

Default:

conn = sqlite3.connect('example.db')
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 971 ms, sys: 92.1 ms, total: 1.06 s
Wall time: 1.06 s
80 Mb

sqlite3.Row:

conn = sqlite3.connect('example.db')
conn.row_factory = sqlite3.Row
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
# print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1.11 s, sys: 80.1 ms, total: 1.19 s
Wall time: 1.19 s

namedtuple:

from collections import namedtuple Row = namedtuple("Row", "id x y p q") conn = sqlite3.connect('example.db') c = conn.cursor() %time res = [Row(*row) for row in c.execute("SELECT id,x,y,p,q FROM test")] conn.close() print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1.89 s, sys: 71.8 ms, total: 1.96 s
Wall time: 1.96 s
80 Mb

namedtuple-based row factory:

conn = sqlite3.connect('example.db')
conn.row_factory = make_row_factory(namedtuple)
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1.93 s, sys: 116 ms, total: 2.05 s
Wall time: 2.05 s
80 Mb

recordclass:

from recordclass import make_dataclass
Row = make_dataclass("Row", "id x y p q", fast_new=True)
conn = sqlite3.connect('example.db')
c = conn.cursor()
%time res = [Row(*row) for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1 s, sys: 72.2 ms, total: 1.08 s
Wall time: 1.07 s
56 Mb

recordclass-based row factory:

conn = sqlite3.connect('example.db')
conn.row_factory = make_row_factory(make_dataclass, fast_new=True)
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1.11 s, sys: 76.2 ms, total: 1.19 s
Wall time: 1.19 s
56 Mb

I demonstrate how to take a resulting dataframe from a sql query and convert it into a namedtuple list. I did not dynamic bind the dataframe columns to the nametuple names, not sure if this is possible.

LanguageRecord=namedtuple('Generic',['id','name','creation_date'])
def map_to_language_record(row):
    return LanguageRecord(row.id, row.name, row.creation_date)

df=pd.DataFrame({'id':[1,2,3],'name':['bob','dick','jane'],'creation_date': 
  ['1/1/2021','1/2/2021','1/3/2021']})

languages = list(map(map_to_language_record, df.itertuples()))
print(languages)

output:

[Generic(id=1, name='bob', creation_date='1/1/2021'), Generic(id=2, name='dick', creation_date='1/2/2021'), Generic(id=3, name='jane', creation_date='1/3/2021')]

I think better to use for language in map(LanguageRecord._make, c.fetchall()[:1]): Because it can cause IndexError with fetchall()[0].

If you need one result and there is already "WHERE" in query. As I understand query should return one row. Early optimization is evil. :)

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top