Pergunta

I have two related tables already populated with data (using SQLAlchemy ORM). However, the individual records are not linked as of yet, i.e. the foreign key column is left empty. I need to bulk update the foreign key column, based a match on different column(s).

To illustrate:

class Location(Base):
    __tablename__ = 'locations'
    id = Column(Integer, primary_key=True)
    x = Column(Float)
    y = Column(Float)


class Stopover(Base):
    __tablename__ = 'stopovers'
    id = Column(Integer, primary_key=True)
    x = Column(Float)
    y = Column(Float)
    location_id = Column(Integer, ForeignKey("locations.id"))
    location = relationship("Location", backref=backref("stopovers"))

So essentially, I need to associate each of 20,000+ 'Stopover' records with a 'Location', by matching the 'x' and 'y' columns - i.e., bulk-update the location_id column.

This code generates the _location_id_ correctly:

for stpvr in session.query(Stopovers).all():
       stpvr.location_id = session.query(Location.id).\
                           filter_by(x=stpvr.x).\
                           filter_by(y=stpvr.y).one()[0]
       session.commit()

However, it doesn't seem to be working - exploring the database through Sqliteman shows that the location_ids havent been updated. Besides, I'm guessing there must be a far more elegant way of approaching this.

In the docs, I found Correlated Updates to be closest to what I'm looking for. However, the docs only refer to the SQL Expression Language, whereas I'm using the ORM. I'm new to SQLAlchemy, and my attempts to translate the docs to ORM haven't been successful.

I would appreciate any help in finding the most elegant way of performing this bulk update. Thanks in advance.

Foi útil?

Solução

SQLAlchemy works in layers. At the base layer, SQLAlchemy provides stuff such as a unified interface to databases using various database drivers, and a connection pool implementation. Above this sits a SQL Expression Language, allowing you to define the tables and columns of your database using Python objects, and then use those objects to create SQL expressions using the APIs that SQLAlchemy gives you. Then there is the ORM. The ORM builds on these existing layers, and so even if you use the ORM, you can still drop down to use the expression API. You are a level even above that, using the declarative model (which builds on the ORM).

Most of the expression API is based on the SQLAlchemy Table object and the columns. The tables are accessible by the __table__ property on the mapped class, and the columns are available as the properties on the mapped class. So, even though you are at the declarative level, you can still utilize much of what you have available to you there while using the models you mapped using declarative. So, the example correlated query...

>>> stmt = select([addresses.c.email_address]).\
...             where(addresses.c.user_id == users.c.id).\
...             limit(1)
>>> conn.execute(users.update().values(fullname=stmt)) 

...can translate to a declarative ORM model by using the __table__ attribute and declarative columns...

>>> stmt = select([Addresses.email_address]).\
...             where(Addresses.user_id == Users.id).\
...             limit(1)
>>> conn.execute(Users.__table__.update().values(fullname=stmt)) 

Here is what I believe your correlated query would look like..

stmt = select([Location.id]).\
    where(and_(Location.x==Stopover.x, Location.y==Stopover.y)).limit(1)

conn.execute(Stopover.__table__.update().values(location_id=stmt)

The resulting SQL:

UPDATE stopovers SET location_id=(SELECT locations.id 
FROM locations 
WHERE locations.x = stopovers.x AND locations.y = stopovers.y
LIMIT ? OFFSET ?)
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top