Question

I need to update the "oldest" row in the database (for now, its limit 1, but I need to also be able to set it to n).

I'm essentially doing a constant stream of "update the oldest rows and retrieve them at the same time".

The column is indexed, it's called last_sent_at and I'm calling it ~20K times a minute.

The thing is, last_sent_at can be NULL and that should always be favored over ones with a filled in time. Once all of them are filled in it should choose the "oldest".

This works for me, the problem is I don't know if its efficient, and it also doesn't work when I need to update 5 rows at once. I guess I could use IN I'm just afraid it would be inefficient.

UPDATE subscribers
SET last_sent_at = '2018-11-17 00:02:27'
WHERE id = (
  SELECT id
  FROM subscribers
  ORDER BY last_sent_at NULLS FIRST
  LIMIT 1
)
RETURNING id;
Was it helpful?

Solution

If the table is big, the key to performance is a matching index on (last_sent_at NULLS FIRST).

Since you are updating so much, have aggressive VACUUM settings. This can be adjusted per table. See:

For only a single row, UPDATE with a subquery like you have is fine.

For LIMIT N with N > 1 a CTE is your safe bet. Like:

WITH cte AS (
   SELECT id
   FROM   subscribers
   ORDER  BY last_sent_at NULLS FIRST
   LIMIT  5
   )
UPDATE subscribers s
SET    last_sent_at = '2018-11-17 00:02:27'
FROM   cte
WHERE  s.id = cte.id
RETURNING id;

If you do this under concurrent write load, add another expression to your ORDER BY to make the order deterministic. And add the same expression to your index to match! (There can be many rows with NULL, right?) And use the same order for all commands taking row locks in the same table. Else you might face deadlocks from multiple transaction taking locks in arbitrary order. Better yet, use FOR UPDATE SKIP LOCKED if you are not absolutely bound to the strict order. Closely related, with in-depth explanation:

Related:

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top