Question

I had converted some tasks to run on a dynamic backend.

The tasks are failing silently [no logged error, no retry, nothing] ~20% of the time (min:10%, max:60%, sample:large, long term). Switching the task away from the backend restores retries and gets the failure rate back to ~0%.

Any ideas?

Was it helpful?

Solution

Converting it to a backend exacerbated the problem but wasn't the problem.

I had specified a task_retry_limit and the queue was a push queue. With a backend the number of instances is specified. (I believe you can replicate this issue on the frontend by ramping up requests rapidly, to a big number).

Tasks were failing 503: Instance Unavailable until they hit the task_retry_limit. This is visible temporarily in Task Queues, but will not show up in Logs.

I should be using pull queues. Even if my use case was stupid I'd probably +1 a task dying due to multiple 503: Instance Unavailable logging something so it doesn't appear like a phantom task.

OTHER TIPS

Which runtime are you using on the backend? Try running the backend for a bit without dynamic set to true and exercise the failing component.

On my project, I have seen tasks that target a static backend disappear on occasion, but no where near the rate you are seeing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top