Question

The latest Google App Engine release supports a new Task Queue API in Python. I was comparing the capabilities of this API vs the already existing Cron service. For background jobs that are not user-initiated, such as grabbing an RSS feed and parsing it on a daily interval. Can and should the Task Queue API be used for non-user initiated requests such as this?

Was it helpful?

Solution

I'd say "sort of". The things to remember about task queues are:

1) a limit of operations per minute/hour/day is not the same as repeating something at regular intervals. Even with the token bucket size set to 1, I don't think you're guaranteed that those repetitions will be evenly spaced. It depends how serious they are when they say the queue is implemented as a token bucket, and whether that statement is supposed to be a guaranteed part of the interface. This being labs, nothing is guaranteed yet.

2) if a task fails then it's requeued. If a cron job fails, then it's logged and not retried until it's due again. So a cron job doesn't behave the same way either as a task which adds a copy of itself and then refreshes your feed, or as a task which refreshes your feed and then adds a copy of itself.

It may well be possible to mock up cron jobs using tasks, but I doubt it's worth it. If you're trying to work around a cron job which takes more than 30 seconds to run (or hits any other request limit), then you can split the work up into pieces, and have a cron job which adds all the pieces to a task queue. There was some talk (in the GAE blog?) about asynchronous urlfetch, which might be the ultimate best way of updating RSS feeds.

OTHER TIPS

I didn't understand the differences very well until I watched the Google I/O video where they explain it. The official source is usually the best.

youtube video

slides from the presentation

The way I look at it is that if I am just parsing one RSS feed a Cron job might be good enough. If I have to parse X number of RSS feeds specified at run time by a user or any other system variable then I would choose tasks every time.

I only say this because in the past I have had to excecute many user defined twitter searches at regular intervals and with Cron jobs I ended making a very bad Queuing system to execute the requests that needed to be ran - it didn't scale, it didn't help that and the smallest interval that a cron job can be is only 1 minute (I had more searches to perform than minutes in the day).

The cool thing about tasks is that you can give them an ETA, so you can say I would like this to be executed 47 seconds in the future, or I would like this to be executed at 12:30.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top