Question

For a large dataset, paginating with an OFFSET is known to be slow and not the best way to paginate. A much better way to paginate is with a cursor, which is just a unique identifier on the row so we know where to continue paginating from where we last left off from the last cursor position.

When it comes to a cursor where it is an auto incrementing id value, it's fairly easily to implement:

SELECT * FROM users
WHERE id <= %cursor // cursor is the auto incrementing id, ex. 100000
ORDER BY id DESC
LIMIT %limit

What we're not certain about, is if instead of an auto incrementing id cursor, the only unique sequential identifiers for the cursor are uuid and created_at on the table rows.

We can certainly query based on the uuid to get the created_at, and then select all users that are <= created_at but the issue is what if there are multiple instances of the same created_at timestamp in the users table? Any idea how to query the userstable based on uuid/created_at cursor combination to ensure we get the correct datasets (just as if we were using auto incrementing id)? Again, the only unique field is uuid since created_at may be duplicate, but their combination would be unique per row.

Was it helpful?

Solution

I will answer what you asked, but first let me tell you that I don't understand why you want to do that. An autoincremental id is very good for this task. But it is correct to also use a timestamp column, because it is a bad practice to rely on an id for sorting. Why? Because there are cases when its order might not be chronological - for example, if you use Galera cluster and you have failovers.

To do what you asked, first create this index:

ALTER TABLE users
    ADD INDEX idx_created_at_uuid (created_at, uuid);

The order columns is important. If you reverse it, the index will not be useful.

Now you just need to run a query like this:

SELECT some_columns
    FROM users
    WHERE created_at <= x AND uuid = y
    ORDER BY created_at DESC;

uuid is only needed because created_at is not unique. If created_at is not the first column, MySQL will have to read all rows and copy them to a temporary table (which could be in-memory or on-disk) to sort them.

If you decide to use the id, just keep the above snippets, but replace uuid with id.

OTHER TIPS

WHERE   created_at <= x
  AND ( created_at < x OR uuid < y )
ORDER BY created_at DESC,
         uuid       DESC

or this equivalent:

WHERE (     created_at < x
       OR ( created_at = x AND uuid < y )
      )
ORDER BY created_at DESC,
         uuid       DESC

This technique works for any pair of columns where the first one (created_at) could have duplicates and the second is unique (uuid or id).

And this is required:

INDEX(created_at, uuid)

Note that both parts of the WHERE are DESC. Mixing ASC and DESC will defeat the usability of the INDEX. (MySQL 8.0 can work around that.)

Note also, that this assumes you don't care what order the rows are in when created_at is duplicated, but you do want a consistent ordering. Note that uuid will be seemingly random, but still consistent. With that said, id (with or without Galera) and uuid work equally well.

(UUIDs suck, but that is a different discussion.)

More on pagination without using OFFSET .

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top