Can SQLAlchemy events be used to update a denormalized data cache?

Question 1

the after_insert() event is one way to do this, and you might notice it is passed a SQLAlchemy Connection object, instead of a Session as is the case with other flush related events. The mapper-level flush events are intended to be used normally to invoke SQL directly on the given Connection:

@event.listens_for(Comment, "after_insert")
def after_insert(mapper, connection, target):
    thread_table = Thread.__table__
    thread = target.thread
    connection.execute(
            thread_table.update().
             where(thread_table.c.id==thread.id).
             values(word_count=sum(c.word_count for c in thread.comments))
    )
    print "updated cached word count to", thread.word_count

what is notable here is that invoking an UPDATE statement directly is also a lot more performant than running that attribute change through the whole unit of work process again.

However, an event like after_insert() isn't really needed here, as we know the value of "word_count" before the flush even happens. We actually know it as Comment and Thread objects are associated with each other, and we could just as well keep Thread.word_count completely fresh in memory at all times using attribute events:

def _word_count(msg):
    return len(msg.split())

@event.listens_for(Comment.message, "set")
def set(target, value, oldvalue, initiator):
    if target.thread is not None:
        target.thread.word_count += (_word_count(value) - _word_count(oldvalue))

@event.listens_for(Comment.thread, "set")
def set(target, value, oldvalue, initiator):
    # the new Thread, if any
    if value is not None:
        value.word_count += _word_count(target.message)

    # the old Thread, if any
    if oldvalue is not None:
        oldvalue.word_count -= _word_count(target.message)

the great advantage of this method is that there's also no need to iterate through thread.comments, which for an unloaded collection means another SELECT is emitted.

still another method is to do it in before_flush(). Below is a quick and dirty version, which can be refined to more carefully analyze what has changed in order to determine if the word_count needs to be updated or not:

@event.listens_for(Session, "before_flush")
def before_flush(session, flush_context, instances):
    for obj in session.new | session.dirty:
        if isinstance(obj, Thread):
            obj.word_count = sum(c.word_count for c in obj.comments)
        elif isinstance(obj, Comment):
            obj.thread.word_count = sum(c.word_count for c in obj.comments)

I'd go with the attribute event method as it is the most performant and up-to-date.

Question 2

You can do this with SQLAlchemy-Utils aggregated columns: http://sqlalchemy-utils.readthedocs.org/en/latest/aggregates.html