Question

Consider a GAE (python) app that lets users comment on songs. The expected number of users is 1,000,000+. The expected number of songs is 5,000.

The app must be able to:

  • Give the number of songs a user has commented on
  • Give the number of users who have commented on a song

Counter management must be transactional so that they always reflect the underlying data.

It seems GAE apps must keep these types of counts calculated at all times since querying for them at request time would be inefficient.

My Data Model

class Song(BaseModel):
    name = db.StringProperty()
    # Number of users commenting on the song
    user_count = db.IntegerProperty('user count', default=0, required=True)
    date_added = db.DateTimeProperty('date added', False, True)
    date_updated = db.DateTimeProperty('date updated', True, False)

class User(BaseModel):
    email = db.StringProperty()
    # Number of songs commented on by the user
    song_count = db.IntegerProperty('song count', default=0, required=True)
    date_added = db.DateTimeProperty('date added', False, True)
    date_updated = db.DateTimeProperty('date updated', True, False)

class SongUser(BaseModel):
    # Will be child of User
    song = db.ReferenceProperty(Song, required=True, collection_name='songs')
    comment = db.StringProperty('comment', required=True)
    date_added = db.DateTimeProperty('date added', False, True)
    date_updated = db.DateTimeProperty('date updated', True, False)

Code
This handles the user's song count transactionally but not the song's user count.

s = Song(name='Hey Jude')
s.put()

u = User(email='me@example.com')
u.put()

def add_mapping(song_key, song_comment, user_key):
    u = User.get(user_key)

    su = SongUser(parent=u, song=song_key, song_comment=song_comment, user=u);
    u.song_count += 1

    u.put()
    su.put()

# Transactionally add mapping and increase user's song count
db.run_in_transaction(add_mapping, s.key(), 'Awesome', u.key())

# Increase song's user count (non-transactional)
s.user_count += 1
s.put()

The question is: How can I manage both counters transactionally?

Based on my understanding this would be impossible since User, Song, and SongUser would have to be a part of the same entity group. They can't be in one entity group because then all my data would be in one group and it could not be distributed by user.

Was it helpful?

Solution

You really shouldn't have to worry about handling the user's count of songs on which they have commented inside a transaction because it seems unlikely that a User would be able to comment on more than one song at a time, right?

Now, it is definitely the case that many users could be commenting on the same song at one time, so that is where you have to worry about making sure that the data isn't made invalid by a race condition.

However, if you keep the count of the number of users who have commented on a song inside the Song entity, and lock the entity with a transaction, you are going to get very high contention for that entity and datastore timeouts will make you application have lots of problems.

This answer for this problem is Sharded Counters.

In order to make sure that you can create a new SongUser entity and update the related Song's sharded counter, you should consider having the SongUser entity have the related Song as a parent. That will put them in the same entity group and you can both create the SongUser and updated the sharded counter in the same transaction. The SongUser's relationship to the User who created it can be held in a ReferenceProperty.

Regarding your concern about the two updates (the transactional one and the User update) not both succeeding, that is always a possibility, but given that either update can fail, you will need to have proper exception-handling to ensure that both succeed. That's an important point: the in-transaction-updates are not guaranteed to succeed. You may get a TransactionfailedError exception if the transaction can not complete for any reason.

So, if your transaction completes without raising an exception, run the update to User in a transaction. That will get you automatic retries of the update to User, should some error occur. Unless there's something about possible contention on the User entity that I don't understand, the possiblity that it will not eventually succeed is surpassingly small. If that is an unacceptable risk, then I don't think that that AppEngine has a perfect solution to this problem for you.

First ask yourself: is it really that bad if the count of songs that someone has commented on is off by one? Is this as critical as updating a bank account balance or completing a stock sale?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top