I never posted this description since I hadn't implemented it yet, but the following is a high level description of how a single task can be executed after a large number of required tasks complete on a system such as GAE without depending on a third party framework.
Fan Out
- fanoutCount = # of fan out tasks
- Create and commit table entry with fanoutCount = #, completeCount=0, Byte array with a number of bits = to the fanoutCount
- Since the max table size is 1Mb, this will support millions of jobs. Still might be safest to have an upper limit
- Iterate over the tasks, enqueing them with the entity key, and a task index (incremented count)
- You might want to implement a strategy to recover a partially done fanout task. Tasks must be idempotent, so could just re-start if the same work is identified each time
Task Consumer
- Perform whatever business logic is desired, I've assumed the entity key is a sufficient identifier
- enqueue entry back to a fan in queue with the Job entry record id, and the task index
- Note: Task idempotency. There is no guarantee a task won't execute more than once. Can either
- a) Execute business logic and final queue message in a transaction, checking inside that the business logic has not already been performed
- b) Make individual business logic steps idempotent (fan in will ignore duplicate messages)
Fan In
- Consume the fan in queue
- In transaction
- Get the table entry (candidate for simple caching),
- Set the bit of the array corresponding to the task index to 1 (if it was already 1 we are double processing, just continue)
- Increment the completeCount,
- if completeCount == fanoutCount enqueu your finalization job
- Commit transaction
- Note: Concurrency. The entry can only take soo many transactions soo fast. The logic itself is simple though. Consider processing batch messages, with low (or no) concurrency. Especially if you only have 1 fan out executing at a time