Question

and thanks in advance for any and all help!!

I'm running a query on the datastore that looks like this:

forks = Thing.query(ancestor=user.subscriber_key).filter(
    Thing.status==True,
    Thing.fork_of==thing_key,
    Thing.start_date <= user.day_threshold(),
    Thing.level.IN([1,2,3,4,5])).order(
    Thing.level)

This query works and returns the results I expect. However, I would like to sort it on one additional field (Thing.last_touched). If I add this to the sort, it won't work because Thing.last_touched is not the property to which the inequality filter is applied. I can't add an additional inequality filter, since we're only allowed one, plus it's not needed (actually, that's why Thing.leve.IN is there.. not needed as a filter, but required for the sort).

So, what I'm wondering is, could I run the query with the filters that I want, and then run code to sort the query results myself? I know I could pull all the parameters I want to sort and store them in dictionaries and sort them that way, but it seems to me there ought to be a way to handle this with the query.

I've searched for days for this but have had no luck.

Just in case you need it, here's the class definition of Thing:

class Thing(ndb.Model):
    title = ndb.StringProperty()
    level = ndb.IntegerProperty()
    fork = ndb.BooleanProperty()
    recursion_level = ndb.IntegerProperty()
    fork_of = ndb.KeyProperty()
    creation_date = ndb.DateTimeProperty(auto_now_add=True)
    last_touched = ndb.DateTimeProperty(auto_now=True)
    status = ndb.BooleanProperty()
    description = ndb.StringProperty()
    owner_id = ndb.StringProperty()
    frequency = ndb.IntegerProperty()
    start_date = ndb.DateTimeProperty(auto_now_add=True)
    due_date = ndb.DateTimeProperty()
Was it helpful?

Solution

One of the main reasons that Google AppEngine is so fast even when dealing with insane amounts of data is because of the very limited query options. All standard queries are "scans" over an index, i.e. there is some table (index) that keeps references to your actual data entires in order sorted by ONE of the data's properties. So, let's say you add the following entries:

Thing A: start-date = Wednesday (I'm just going to use weekdays for simplicity)
Thing B: start-date = Friday
Thing C: start-date = Monday
Thing D: start-date = Thursday

Then, AppEngine will create an index that looks like this:

1 - Monday    -> Thing C
2 - Wednesday -> Thing A
3 - Thursday  -> Thing D
4 - Friday    -> Thing B

Now, any query will correspond to a continuous block in this (or another) index. If you, for example, say "All Things with start-date >= Tuesday", it will return entries in row 2 through 4 (i.e. Thing A, Thing D, and Thing B in that exact order!). If you query for "< Thursday", you get 1-2. If you say "> Tuesday and <= Thursday" you get 2-3.

And if you are doing inequality filters on a different property, AppEngine will use a different index.

This is why you can only do one inequality filter and why the sort-order is always also specified by the property that you do an inequality filter of. Because AppEngine is not designed to be able to return items 1, 2, 4 (with a gap*) out of an index, or items 4, 2, 3 (no gap, but out of order).

So, if you need to sort your entries on a different property other than the one you use for inequality filtering, you basically have 3 choices:

  1. Perform your query with the inequality filter, read all results into memory, and sort them in your code afterwards (I think this is what you mean by storing them in a dictionary)
  2. Perform your query WITHOUT the inequality filter, but sorted on the right property. Then, as you loop over the returned entries, simply check the inequality yourself and drop the ones that don't match
  3. Perform your query with the inequality filter and just return the items in the wrong order, and let the client-application worry about sorting them! ;)

Generally I would assume that you have much more unused resources available client-side to do the sorting, so I would probably go for option 3 in most cases. But if you need to sort the entries server-side (e.g. for a mobile-app targeted at older smart-phones), it will depend on the size of your database and the fraction of entries that usually match your inequality filter, whether option 1 or option 2 are better. If your inequality filter only removes a small fraction of the entries, option 2 might be much faster (as it doesn't require any O(>n) sorting), but if you have a huge database of entries and only a very small number of them will match the inequality, definitely go for option 1.

BTW: The talk "App Engine Datastore Under the Covers" from Google I/O 2008 might be a very helpful resource. It's a bit technical, but it gives a great overview of this topic and I consider it must-know information if you want to do anything in AppEngine. Note, though, that this talk is a bit out-dated. There are a bunch more things that you can do with queries now-a-days. But ALL of these extra things (if I understand correctly) are API functions that in the end just generate a set of several simple queries (exactly like the ones described in this talk) and then just combine the results of these in memory in your application (just like you would if you did your own sorting).

*There are some exceptions where AppEngine can generate the intersection of two (or more?) index-scans to drop items from the results, but I don't think that you could use that to change the order of the returned entries.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top