Question

I'd like some advice on the best way to do a strongly consistent read/write in Google App Engine.

My data is stored in a class like this.

class UserGroupData(ndb.Model):
  users_in_group = ndb.StringProperty(repeated=True)
  data = ndb.StringProperty(repeated=True)

I want to write a safe update method for this data. As far as I understand, I need to avoid eventually consistent reads here, because they risk data loss. For example, the following code is unsafe because it uses a vanilla query which is eventually consistent:

def update_data(user_id, additional_data):
  entity = UserGroupData.query(UserGroupData.users_in_group==user_id).get()
  entity.data.append(additional_data)
  entity.put()

If the entity returned by the query is stale, data is lost.

In order to achieve strong consistency, it seems I have a couple of different options. I'd like to know which option is best:

Option 1:

Use get_by_id(), which is always strongly consistent. However, there doesn't seem to be a neat way to do this here. There isn't a clean way to derive the key for UserGroupData directly from a user_id, because the relationship is many-to-one. It also seems kind of brittle and risky to require my external clients to store and send the key for UserGroupData.

Option 2: Place my entities in an ancestor group, and perform an ancestor query. Something like:

def update_data(user_id, additional_data):
  entity = UserGroupData.query(UserGroupData.users_in_group==user_id,
                               ancestor=ancestor_for_all_ugd_entities()).get()
  entity.data.append(additional_data)
  entity.put()

I think this should work, but putting all UserGroupData entities into a single ancestor group seems like an extreme thing to do. It results in writes being limited to ~1/sec. This seems like the wrong approach, since each UserGroupData is actually logically independent. Really what I'd like to do is perform a strongly consistent query for a root entity. Is there some way to do this? I noticed a suggestion in another answer to essentially shard the ancestor group. Is this the best that can be done?

Option 3:

A third option is to do a keys_only query followed by get_by_id(), like so:

def update_data(user_id, additional_data):
  entity_key = UserGroupData.query(UserGroupData.users_in_group==user_id,
                                   ).get(keys_only=True)
  entity = entity_key.get()
  entity.data.append(additional_data)
  entity.put()

As far as I can see this method is safe from data loss, since my keys are not changing and the get() gives strongly consistent results. However, I haven't seen this approach mentioned anywhere. Is this a reasonable thing to do? Does it have any downsides I need to understand?

Was it helpful?

Solution

I think you are also conflating the issue of inconsistent queries with safe updates of the data.

A query like the one in your example UserGroupData.query(UserGroupData.users_in_group==user_id).get() will always only return one entity, if the user_id is in the group.

If it has only just been added and the index is not up to date then you won't get a record and therefore you won't update the record.

Any update irrespective of the method of fetching the entity should be performed inside a transaction ensuring update consistency.

As to ancestors improving the consistency of the query, it's not obvious if you plan to have multiple UserGroupData entities. In which case why are you doing a get().

So option 3, is probably your best bet, do the keys only query, then inside a transaction do the Key.get() and update. Remember cross group transactions are limited 5 entity groups.

Given this approach if the index the query is based is out of date then 1 of 3 things can happen,

  1. the record you want isn't found because the newly added userid is not reflected in the index.
  2. the record you want is found, the get() will fetch it consistently
  3. the record you want is found, but the userid has actually been removed and the index is out of date. The get() will retrieve the index consistently and the userid is not present.

You code can then decide what course of action.

What is the use case for querying all UserGroupData entities that a particular user is a member of that would require updates ?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top