How should I read multiple queues, pro rata by user and priority?

https://stackoverflow.com/questions/23123052

05-07-2023
|

Question

I am currently trying to think of ways to replace a MySQL + Cron queuing system with a message queue system (AWS SQS/Beanstalkd/Iron MQ/Redis).

Let's say I have 100 users. These users are able to make API requests to me. Each API request is an SMS which I must send via a single modem which I operate.

Each SMS can have a priority of 1-3.

The problem that I am facing, is that the single modem is a bottleneck, so I can't simply process the queue in a FIFO order, because if one user sends 10,000 SMS and I add these to the queue, my other users would not see any SMS go out until these 10,000 for the first user have finished.

Right now, I am using MySQL for the task:

SELECT COUNT(*) AS `count`, `user_id` FROM `queue` GROUP BY `user_id`

This would give me a result like this:

count  | user_id
-------|--------
10000  | 1
1      | 2

I then add the counts together which gives me 10,001 SMS to process.

I do a sum on each row:

(row_count / total_count) * 100 = percentage

E.g.:

(10000 / 10001) * 100 = 99.9900009999%
(1 / 10001) * 100 = 0.0099990001%

I know that my modem can handle 140 SMS per second, so if my Cron runs on a 1 minute cycle, I will send 8,400 SMS in a minute.

I use these calculations to give me my selections:

ceil( (8400 / 100) * 99.9900009999) ) = 8,400 for user #1
ceil( (8400 / 100) * 0.0099990001) ) = 1 for user #2

So in this case, I do a simple MySQL select for each user with a LIMIT, ordering by priority ASC, to give me any priority 1s first, and any priority 3s last.

It doesn't matter if we push more than 8,400 to the modem because it will simply queue on the modem, although the modem doesn't guarantee FIFO, so we need to be as tight on the 8,400 per minute as possible. In this case we push 8,401 to the modem.

This is much better, because rather than sending all 10,000 for user 1 first, we only do 8,400 and also get some of user 2's SMS out even though they only have 1 SMS. It's still weighted on who has the most SMS to process and it keeps inline with the modem throughput too.

Given the fact that I need priorities, I am currently looking at Beanstalkd as my only option.

I figured I could create a queue for each user, and when API requests come in, add the SMS to the user queue along with the priority.

I would then have one worker, which does a count on each queue (some user queues may be empty so I wouldn't want a worker for each user constantly running).

Once the single worker has the queue count for each user, it will start to read each queue up to the maximum number I specify for each user and push to the modem in order.

So in this case, it will read 8,400 SMS for user #1 and 1 SMS for user #2 in that order.

To get SMS to the modem, I have to use HTTP. If I get a 200 OK, I can delete the job. If I get a Error 500, I will not delete the job so it will be picked up again. For anything else, I will throw an exception and bury the job in Beanstalkd for inspection by a human.

My concerns here is that because I am using HTTP, this is a bottleneck in itself. Ideally I will want to perform 8,400 HTTP requests in 1 minute using cURL (140/sec). I am aware that I can use curl_multi_* functions to perform say 10 HTTP requests concurrently to speed this up but I am looking to see if there could be any other options to speed things up further?

The main issue is that this is blocking. So one user's SMS will go before all of the other users SMS. Here we will process 8,400 SMS for user #1, followed by 1 SMS for user #2.

For example, should I think about spawning a worker for each user once I have their total count of messages to process? If I did this, we would process SMS for user #1 and user #2 concurrently. With this option though, I do worry that I cannot control the overall amount of HTTP requests going to the modem, because I do not want to overload it. What happens if I have 100 child workers all doing 10 HTTP requests concurrently to the modem?

These workers would have to be child processes that close once finished. The parent process would need to know about this to then perform another calculation and spawn new child workers.

If anyone has any suggestions on how to handle this scenario of multiple queues with one queue the other end (the modem), that would be most helpful.

Solution

My first thought is to use Beanstalkd priorities, and split the messages into groups, each with a different priority.

User 1 wants to send 10,000 msgs.
User 2 wants to send 101 msgs.
messages 1-100 of user 1 are put into the queue at priority 1
messages 101-200 of user 1 are put into the queue at priority 2
messages 201-300 of user 1 are put into the queue at priority 3
messages 301-400 of user 1 are put into the queue at priority 4 (etc)
messages 1-100 of user 2 are put into the queue at priority 1
message 101 of user 2 are put into the queue at priority 2 (etc)

The first 100 messages of each are sent first (which ones really leaves the gate depends on when they were put into the queue). Without a delay (eg, send after 90 seconds) involved, messages/jobs closest to priority 0 get sent first.

To make sure that some of every user are sent on every round, I'd limit the max priority that you set to the number of customers that you have, so you don't have your biggest customer end up with a priority of 1,000,000 or more, which would that all the rest of their messages had to wait until everyone else had completed. Just restart the priority back at one.

OTHER TIPS

You could get the 8400 messages by getting the same number from each user wanting to send. This approach favors all users equally; if one user has a large backlog, other users are less impacted. Get 10 from each user with a non-empty queue. If there is space left, get another 10 from each. If there are more users wanting to send than slots, choose each 10 from a user picked at random.

A proportional approach will trickle out a small user's 100 messages at the same % rate as the large user's 10,000. Two users, one with 10 and one with 10,000, will both finish at the same time. The user with 10 will wonder if there's a service outage.

By sending users at the same rate as each other, 3 users with backlogs of 10 minutes, 1 hour and 5 hours each get 1/3 of the capacity, any onesie-twosie messages would go out immediately, and the 10 minute user will finish much sooner than the 1 hour or 5 hour users, as would be expected.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow