Question

I have this project which still uses delayed job as processing job queue. I've recently found an edge case which is making me question a few things: I have this AR (I'm using MySQL, by the way) object, which on update sends a message to all the elements of an has_many association. In order to do that, I have to instantiate all the elements of this association an call the message on them. It seemed only fair enough to delay the call of this message for each one of them.

Now the association has grown quite a bit, where in an edge case I have 40000 objects belonging to that association. The message sending thereby now involves the (synchronous) creation of 40000 delayed-job jobs. Since these happen inside an after update callback an not after commit, they are thereby (ab)using the same connection, not taking advantage of any context-switching. Short version, I have a pipe of 1 Update statement and 40000 Inserts on the same connection. This update is gobbling quite a few minutes in production, for that reason.

Now, there are a lot of ways around this: Change the callback to an after commit, creating 1 (synchronous) delayed job which will create 40000 jobs (I don't want to handle the 40000 (AR) objects in one job, the 40000 now will be 120000 tomorrow, and that's memory-armageddon), etc etc...

But what I'm really considering is switching my delayed processing queue to resque or sidekiq. They use redis, so write performance is far better. They use something rather than MySQL, which means the connections will not block each other. My only issue is: how much would 40000 writes at once to redis cost me? And: does any one of these options first store the jobs in memory, not blocking the response to the client and belatedly stores them in redis? So, my real question is: how much would this delaying delay me in such an edge case?

Was it helpful?

Solution

Indeed, Redis can process writes faster than MySQL. Try running redis-benchmark, you'll see figures of 100k+ writes/sec.

does any one of these options first store the jobs in memory, not blocking the response to the client and belatedly stores them in redis?

No, they do it synchronously.

I don't want to handle the 40000 (AR) objects in one job

Maybe you should try hybrid approach: process chunks of N objects per job. Batch writes should be faster than 40k individual writes. And it scales well (batch size will stay the same, be it 40k or 400k items).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top