Question

I'm building a voting site in GAE, that will work like this:

  • Voting is declared open (but remains open only for a minute or two).
  • People cast their votes.
  • Voting closes. No more votes can be cast.
  • The results are displayed.

The voting phase only lasts a minute or two, so lots of votes will be cast within a short period.

I want to avoid Datastore contention, so I can't store the votes in an entity group (it will likely exceed the ~1 write/sec limit).

However, I must ensure I count ALL the votes, once voting has closed.

My question is: How can I ensure Datastore consistency for the votes (without an entity group), once the voting has closed? In other words, at what point can I be sure every vote has been written to (and is readable from) the Datastore?

Only once I know every single vote is readable, can I safely calculate the results.

PS: I should note this is not a "simple" voting scheme; voters choose their 1st, 2nd and 3rd choices, and the winner is determined by a rather complex iterative process, i.e. it's not sufficient to count how many votes go to each candidate.

Thanks in advance!

Was it helpful?

Solution

My 2c.

Supposing you use the user service. I ll use a vote handler and a choice as the input from the user. I won't use ancestors so all votes will be root entities. We can use the user_id for the user which is unique as the key for the vote.

Now depending on the performance we have 3 choices. Why? Lets see.

The first 2.

Approach 1 - Blind writes (No transactions)

class VoteHandler(webapp2.RequestHandler):

  def get(self, choice):

    user = users.get_current_user()

    # Blind write it! or just check before if exists 
    Vote(id=user.user_id, selection=choice).put()

On the first approach we just write the entity. This way we don't use transactions thus we don't lock all the root entities. We just write. We could also have done a get with eventual consistency, just to check and maybe save further writes. Yes that's a problem. Many writes can happen in between and the value will always be of the last write.

Approach 2 - Get_or_inserts (small transactions)

class VoteHandler(webapp2.RequestHandler):

  def get(self, choice):

    user = users.get_current_user()

    # Construct a vote with the user_id as a key using get_or_insert
    vote = Vote.get_or_insert(user.user_id())

    # Check if he has voted (general check with default entity prop to None)
    if vote.selection is not None:
      # vote is cast return or do other logic
      return

    vote.selection = choice
    vote.put()

Then knowing the user_id which is the key for the vote, you can get the votes with strong consistency. This way one user has only one vote with one or more selections if needed.

Regarding get_or_insert what it does is to use a transaction and do a get like so:

def txn(key_name, **kwds):
  entity = Story.get_by_key_name(key_name, parent=kwds.get('parent'))
  if entity is None:
    entity = Story(key_name=key_name, **kwds)
    entity.put()
  return entity

def get_or_insert(key_name, **kwargs):
  return db.run_in_transaction(txn, key_name, **kwargs)

get_or_insert('some key', title="The Three Little Pigs")

In the second approach I used get_or_insert at start and later I just checked against a property if it's "set". Depending on that condition we save or not. Beware!!! A concurrent request might have altered the property vote_selection and already have set it.

Some thoughts on this:

By using the user_id I know that only same user concurrent requests will trigger this behaviour.

Basically if a user initiates 2 concurrent vote_selection requests, then there is a change that the requests will:

  • Both check if the entity Vote exists.
  • The one will insert the entity.
  • The other will get the entity.

But maybe both of them will see the selection property as None and both will try to write. The last one will be the valid. And you will have 2 or more writes (if there where more requests).

Approach 3 - Transactional

class VoteHandler(webapp2.RequestHandler):

  def get(self, choice):

    user = users.get_current_user()

    self.vote(user.user_id, choice)

  @ndb.transactional()
  def vote(key, choice):
    vote = ndb.Key(Vote, key).get()
      if vote:
         # user has voted 
         return
      # return the key 
      return Vote(id=key, selection=choise).put()

In this case all go smooth but we lock the root entity Vote until each transaction completes. Any other transaction will retry if one or more are currently happening.

Choose wisely, and I would like to see more answers/opinions/approaches.

OTHER TIPS

Have a look at Sharding Counters, it's GAE design pattern for scenarios where a large number of writes are expected within a short time on an entity group.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top