Converting it to a backend exacerbated the problem but wasn't the problem.
I had specified a task_retry_limit
and the queue was a push queue. With a backend the number of instances is specified. (I believe you can replicate this issue on the frontend by ramping up requests rapidly, to a big number).
Tasks were failing 503: Instance Unavailable
until they hit the task_retry_limit
. This is visible temporarily in Task Queues, but will not show up in Logs.
I should be using pull queues. Even if my use case was stupid I'd probably +1 a task dying due to multiple 503: Instance Unavailable
logging something so it doesn't appear like a phantom task.