Question

I have an app on GAE that lets user add\edit posts to any arbitrary path (like a wiki). I am storing all the posts in a single table. The table is structured as follows:

class WikiPosts(db.Model):
    path = db.StringProperty(required = True)
    content = db.TextProperty(required = True)
    date_created = db.DateTimeProperty(auto_now_add = True)

On the home page I want to display the latest post for each path.

My question is similar to this one ( Select first row in each GROUP BY group? ) but the answers involve using join which is not possible in GAE.

I can have a dedicated field to keep track of the latest post for each url but is it possible to do it using gql query?

As of now, I am using this query which returns all the versions of all the wiki posts sorted by their creation time.

db.GqlQuery("SELECT * FROM WikiPosts ORDER BY date_created DESC limit=10")
Was it helpful?

Solution

Since you do not have a unique list of paths, and since GAE does not support the equivalent of SQL's SELECT DISTINCT (see here, here, and here), you'll have to

  1. generate that list every time the home page is displayed (not recommended once you exceed a few hundred posts), or
  2. create another table/model to keep track of unique paths as new posts are added (then use this new table in combination with Shay's answer), or
  3. as you stated, keep track of the latest post per path.

I think option 3 is your best bet, since (as is often the case with GAE), you would be putting into the datastore exactly what you want to get out (i.e., make writes more complicated in favor of quick reads.)

OTHER TIPS

You can call:

def get_latest_posts(amount=10):
  return db.GqlQuery("SELECT * FROM WikiPosts where path =:1 ORDER BY date_created DESC limit=:2", path, amount)

This will return for each path the latest posts.

There are a lot of issues related to doing a query for every page visit. Certainly you may have higher number of instances and will have to contend with eventual consistency for the index supporting the query. Since your page is just a top ten list, why not simply manage it with a memcache object? If the object is not found, then do your query and write the missing memcache object. Posting list update would write/rewrite memcache before the entity put() -- which might be good candidate for a deferred function (task queue).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top