Question

I have a table like this:

CREATE TABLE IF NOT EXISTS `jobs` (
    `job_id` varchar(36) NOT NULL,
    `job_status` varchar(30) NOT NULL,
    `created_at` datetime NOT NULL,
    `lease_date` datetime,
    `priority` int NOT NULL,
    PRIMARY KEY(`job_id`),
    INDEX `job_status_priority_lease_date` (`job_status`, `priority`, `lease_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

The full query I want to perform on the table would be the following:

select * from jobs 
where job_status="IN_PROGRESS"
and lease_date<"10minutesago"
order by 
priority
limit 100

I am implementing database as a job queue. 10 min is like the job time out, I only want to act on the jobs that are already timed out. Jobs has different priority so I was ordering by it. Also, I want to make my DB job queue behave like FIFO, that's why my index includes lease_date.

I definitely want my results to be ordered by Priority, e.g. two chunks of the results with Priority 0 and 1. I don't need my results to be strictly ordered by lease_date, as long as they are Ascending order in their own chunks that is good enough for me. e.g. result set:

priority lease_date
0        2017-08-19
0        2018-09-20
1        2016-08-20
1        2018-10-20

Is my index enough for that? How much can I optimize the query and build an index to boost the performance as much as possible?

Was it helpful?

Solution

By "10minutesago" do you mean NOW() - INTERVAL 10 MINUTE?

order by priority DESC sorts only on priority. The ordering within it is unpredictable. It might be ascending, it might be descending, or seemingly random.

Your INDEX(job_status, priority, lease_date) won't get as far as lease_date. It will take care of where job_status="IN_PROGRESS" and it might then walk through the rows in priority order. But that's all.

You could say ORDER BY priority DESC, lease_date DESC, but and lease_date < "10minutesago" will not make use of the index.

Think of building an index thus: First have any number of columns tested with =. Then have one range column.

(After OP's edit)

Your example implies that you need

ORDER BY priority ASC, lease_date ASC

As for the question of efficiency, ...

The WHERE clause needs

INDEX(job_status, lease_date)  -- in this order; adding `priority` won't help

If the optimizer would prefer to focus on the ORDER BY, then

INDEX(job_status, priority, lease_date)  -- in this order

Notes:

  • job_status is first because it is tested with =.
  • Tacking on just lease_date makes it possible to handle the entire WHERE in the index. But there will still be an "filesort" to handle the ORDER BY.
  • Tempting the Optimizer to focus on the ORDER BY with the second index (to avoid the filesort) may or may not backfire. It depends on how much of the data needs to be skipped over because of the lease_date filter.

If you have a thousand rows in the table, simply add both indexes; the query will be "fast enough". If you have a million rows, then you are stuck with a tough problem and should rethink the queuing mechanism. No matter what you do with the indexes, occasionally the wrong index will be used and the query will run painfully slow. Be aware that a hiccup in the system can make a queue sporadically jump from a civilized number of entries to a huge number.

Do you imply that this table is a "queue"? How long does it take to perform an item in the queue? If it is a "short" time, then "Don't queue it, just do it".

Add priority to WHERE

If you change to

select * from jobs 
    where job_status = "IN_PROGRESS"
      AND priority = 1
    and lease_date < NOW() - 10 MINUTE
    order by priority
    limit 100

then either of these becomes optimal:

INDEX(job_status, priority, lease_date)
INDEX(priority, job_status, lease_date)

See my Cookbook for discussion.

If you have a mixture of with and without priority, then have these two:

INDEX(job_status, lease_date),
INDEX(job_status, priority, lease_date)

The Optimizer will pick between them.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top