Query optimization ndbcluster + order by
-
02-11-2019 - |
Question
I'm not really all too familiar with the ndbcluster storage engine of MySQL, but I've recently written an application that does some very simple queries. However, a co-worker of mine (who is now on vacation, or I'd ask him about it, of course) left a TODO in the comments next to what I thought to be a pretty harmless query.
The comment read "TODO: Avoid sorting on PK when querying on MySQL cluster"
Granted, there are some things I'm aware of when working on a cluster (like using less JOIN
's) but I gathered the index (and therefore the PK) lookups were effectively O(1) operations, so a simple ORDER BY someID DESC
wouldn't hurt performance.
I've been looking around on google, but I can't quite find a definitive answer as to what this ORDER BY
means on a cluster. I can make a few educated guesses, but still...
What's more, I can't really think what query I can use instead. Basically, I have a generic method that enables me to fetch the last N rows based on 1 or 2 values. The simplified version of the query looks something like this:
SELECT *
FROM db.tbl
WHERE user_Id = ?
AND req_action = ?
ORDER BY tbl_Id DESC
LIMIT 0,?; -- 0,1 is default, though
Most of the time, the limit, though is 0,1, so I came up with this alternative approach that enables me to not use the ORDER BY
clause, but it reduces the flexibility of my method, rather significantly:
SELECT *
FROM db.tbl
WHERE
tbl_Id = (
SELECT MAX(tbl_ID)
FROM db.tbl
WHERE user_Id = ?
AND req_action = ?);
But in that case, I'm always going to end up with either an empty resultset, or a single row.
My questions:
- Is it true that
ORDER BY
on the primary key is not advisable on a cluster - if so, what alternative constructs should I be looking at to maintain the flexibility of my initial query, while sticking to good practices for cluster queries?
Things to keep in mind:
- The dataset I'm querying is certain to have hundreds of thousands, if not millions of rows.
- The table will be used heavily, both read and write operations, 30 RW/s is not exceptional
- I can write stored procedures, but we only use them as a last resort.
- An
EXPLAIN EXTENDED
of both queries would have me believe that the first (my initial query) is still preferable, but I ran it on a vagrant box, and just noticed the test tables use InnoDB storage engine, so I take it I can't trust those results :)
No correct solution