Question

I have a "queue" of about a million entities on google app engine. I have to "pop" items off of the queue by using a query.

There are a bunch of client processes running all over the place that are constantly making requests to the stack. My problem is that when one of the clients requests an item, I want to make sure that I am removing that item from the front of the queue, sending it to that client process, and no other processes.

Currently, I am querying for the item, modifying its properties so that a query to the queue no longer includes that item, then saving the item. Using this method, it is very common for one item to be sent to more than one client process at the same time. I suspect this is because there is a delay to when I am making the writes and when they are being reflected to other processes.

Perhaps I need to be using transactions in some way, but when I looked into that, there were a couple of "gotchas". What is a good way to approach this problem?

Was it helpful?

Solution

I see two ways to tackle this:

  1. What you are doing is ok, you just need to use transactions. If your processes are longer then 30s then you can offload them to task queue, which can be a part of transaction.

  2. You could use Pull Queues, where you fill up a queue and than client processes pull tasks from the queue in atomic fashion (lease-delete cycle). With Pull Queues you can be sure that task is leased only once. Also task must be manually deleted from queue after it's done, meaning if your process dies task will be put back in queue after lease expires.

OTHER TIPS

Is there any reason not to implement the "queue" using App Engine's TaskQueue API? If size of the queue is the problem, TaskQueue could contain up to 200 million Tasks for a paid app, so a million entities would be easily handled.

If you want to be able to simulate queries for a certain task in the queue, you could use task tags, and have your client process pull tasks with a certain tag to be processed. Note that pulling tasks is supported through pull queues rather than push queues.

Other than that, if you want to keep your "queue-as-entities" implementation, you could use the Memcache API to signal the client process which entity need to be processed. Memcache provides stronger consistency when you need to share data between instances of your app compared to the eventual consistency of the HRD datastore, with the caveat that data in Memcache could be lost at any point in time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top