Question

I want to build a backend for a mobile game that includes a "real-time" global leaderboard for all players, for events that last a certain number of days, using Google App Engine (Python).

A typical usage would be as follows: - User starts and finishes a combat, acquiring points (2-5 mins for a combat) - Points are accumulated in the player's account for the duration of the event. - Player can check the leaderboard anytime. - Leaderboard will return top 10 players, along with 5 players just above and below the player's score.

Now, there is no real constraint on the real-time aspect, the board could be updated every 30 seconds, to every hour. I would like for it to be as "fast" as possible, without costing too much.

Since I'm not very familiar with GAE, this is the solution I've thought of:

  • Each Player entity has a event_points attribute
  • Using a Cron job, at a regular interval, a query is made to the datastore for all players whose score is not zero. The query is sorted.
  • The cron job then iterates through the query results, writing back the rank in each Player entity.

When I think of this solution, it feels very "brute force".

The problem with this solution lies with the cost of reads and writes for all entities. If we end up with 50K active users, this would mean a sorted query of 50K+1 reads, and 50k+1 writes at regular intervals, which could be very expensive (depending on the interval)

I know that memcache can be a way to prevent some reads and some writes, but if some entities are not in memcache, does it make sense to query it at all? Also, I've read that memcache can be flushed at any time anyway, so unless there is a way to "back it up" cheaply, it seems like a dangerous use, since the data is relatively important.

Is there a simpler way to solve this problem?

Was it helpful?

Solution

You don't need 50,000 reads or 50,000 writes. The solution is to set a sorting order on your points property. Every time you update it, the datastore will update its order automatically, which means that you don't need a rank property in addition to the points property. And you don't need a cron job, accordingly.

Then, when you need to retrieve a leader board, you run two queries: one for 6 entities with more or equal number of points with your user; second - for 6 entities with less or equal number of points. Merge the results, and this is what you want to show to your user.

As for your top 10 query, you may want to put its results in Memcache with an expiration time of, say, 5 minutes. When you need it, you first check Memcache. If not found, run a query and update the Memcache.

EDIT:

To clarify the query part. You need to set the right combination of a sort order and inequality filter to get the results that you want. According to App Engine documentation, the query is performed in the following order:

  1. Identifies the index corresponding to the query's kind, filter properties, filter operators, and sort orders.
  2. Scans from the beginning of the index to the first entity that meets all of the query's filter conditions.
  3. Continues scanning the index, returning each entity in turn, until it encounters an entity that does not meet the filter conditions, or reaches the end of the index, or has collected the maximum number of results requested by the query.

Therefore, you need to combine ASCENDING order with GREATER_THAN_OR_EQUAL filter for one query, and DESCENDING order with LESS_THAN_OR_EQUAL filter for the other query. In both cases you set the limit on the results to retrieve at 6.

One more note: you set a limit at 6 entities, because both queries will return the user itself. You can add another filter (userId NOT_EQUAL to your user's id), but I would not recommend it - the cost is not worth the savings. Obviously, you cannot use GREATER_THAN/LESS_THAN filters for points, because many users may have the same number of points.

OTHER TIPS

Here is a Google Developer article explaining similar problem and the solution using the Google code JAM ranking library. Further help and extension to this library can be discussed in the Google groups forum.

The library basically creates a N-ary tree with each node containing the count of the scores in a particular range. The score ranges are further divided all the way down till leaf node where its a single score . A tree traversal ( O log(n) ) can be used to find the number of players with score higher than the specific score. That is the rank of the player. It also suggests to aggregate the score submission requests in a pull taskqueue and then process them in a batch in a background thread in a backend.

Whether this is simpler or not is debatable.

I have assumed that ranking is not just a matter of ordering an accumulation of points, in which case thats just a simple query. I ranking involves other factors rather than just current score.

I would consider writing out an Event record for each update of points for a User (effectively a queue) . Tasks run collecting all the current Event records, In addition you maintain a set of records representing the top of the leaderboard. Adjust this set of records, based on the incoming event records. Discard event records once processed. This will limit your reads and writes to only active events in a small time window. The leader board could probably be a single entity, and fetched by key and cached.

I assume you may have different ranking schemes like current active rank (for the current 7 days), vs all time ranks. (ie players not playing for a while won't have a good current rank).

As the players view their rank, you can do that with two simple queries Players.query(Players.score > somescore).fetch(5) and Players.query(Players.score < somescore).fetch(5) this shouldn't cost too much and you could cache them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top