Return values at more/less specific time points

https://dba.stackexchange.com/questions/200105

26-12-2020
|

Question

In PostgreSQL 10.0, I have a big table with two columns:

CREATE TABLE xs (
  ts timestamp with time zone NOT NULL,
  x integer NOT NULL
)

Now, if I have 100 given time points, how do I return exactly 100 x-es that are right before (<=) these time points (for each of those points)?

For just a single one this would be very easy:

SELECT x FROM xs WHERE ts <= '2018-03-13 11:41:47.167973+00' LIMIT 1

But how to do this efficiently for a 100? Something like:

SELECT x FROM xs 
  WHERE ts FIRST_LESS_THAN_EQUAL_IN
    ('2018-03-13 11:41:47.167973+00', 
     '2018-03-13 11:41:47.198564+00', 
     '2018-03-13 11:41:47.555668+00',
      ...
     '2018-03-13 11:41:57.766888+00')

Obviously, there is an index on ts.

Solution

I suggest a LATERAL subquery, best with a LEFT [OUTER] JOIN to preserve all input rows (displaying a NULL value in case there should be no match at all):

SELECT v.reference_ts, t.most_recent_x
FROM  (
   VALUES
      (timestamptz '2018-03-13 11:41:47.167973+00') -- type cast in first row
    , ('2018-03-13 11:41:47.198564+00') 
    , ('2018-03-13 11:41:47.555668+00')
    , ('2018-03-13 11:41:57.766888+00')
   ) AS v(reference_ts)
LEFT JOIN LATERAL (
   SELECT x AS most_recent_x
   FROM   xs
   WHERE  ts <= v.reference_ts  -- ts is unique (?!); see below
   ORDER  BY ts DESC
   LIMIT  1
   ) AS t ON true;

A VALUES expression is one way to put your 100 timestamps into a derived table we can work with. The free-standing VALUES expression may require an explicit type cast in the first row. See:

Casting NULL type when updating multiple rows

There are others ways to pass your values, like an array (or an array literal cast to timestamptz[]) and unnest() in the query:

Optimizing a Postgres query with a large IN

About the LATERAL subquery:

What is the difference between LATERAL and a subquery in PostgreSQL?

Don't forget the ORDER BY or you get arbitrary results - like @a1ex07 commented.

If ts is not unique, it's undefined which of the duplicates is returned, unless you add more ORDER BY items to make it deterministic. Like: ORDER BY ts DESC, x DESC to get the row with the biggest x from any set of peers. And make the index match.

An index on (ts) (like you mentioned) works wonders on performance.
Or, ideally, on (ts DESC, x) if preconditions for index-only scans are met.

Does PostgreSQL use an index-only scan in this JOIN?

dbfiddle here

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange