Pergunta

I'm using App Engine with the eventually-consistent High Replication Data Store. I'm also using sharded counters.

When I query for all of the shards and sum them up, can I assume the counts are strongly consistent? That is, will the below code return an accurate sum of my shard counts?

sum = 0
for counter in Counter.all():
    sum += counter.count
Foi útil?

Solução

If you want to create strongly consistent sharded counters, you should use keys, not queries.

#for getting
total = 0
shard_keys = []
for i in range(20): #20 shards
    key_name = shard + str(i)
    shard_keys.append(db.Key.from_path('Counter', key_name))
counters = db.get(shard_keys)
for counter in counters:
    if counter:
        total += counter.count

#for incrementing a shard
import random
key_name = 'shard' + str(int(random.random()*20)) #choose a random shard
counter = Counter.get_by_key_name(key_name) #try to retrieve from datastore
if not counter:
    counter = Counter(key_name=key_name) #shard doesn't exist, create one
counter.count += 1
db.put(counter)

Perform the incrementing within a transaction to ensure consistency.

Outras dicas

No. Even fetching by key, you cannot rely on a strongly consistent count (though it will be more up to date than it would otherwise). Batch get operations are not transactional, so one of the shards could be updated while you are fetching them.

Asking for strong consistency here is kind of meaningless, however. First, in a distributed system like App Engine, simultaneity is a fuzzy concept at the best of times - synchronization requires coordination, which creates bottlenecks. Second, even if you could get a transactional sum of the counter values, it'd be out of date the moment you fetched it, since the counters can be updated immediately after you read them anyway.

Queries are eventually consistent in HRD, so you can not be sure that entities you get via query are updated. If query depends on the entity property that is being updated, than query might not even find the entity.

You can increase the probability that the sharded counter totals current state, but you cannot (as best I know) get that probability to 100%.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top