Querying time series in Postgress

Question

One problem with the way you are currently doing it is that it does not generate a data point in any invervals which do not have any sample data. For example, if the user wants a chart from seconds 0 - 10 in steps of 1, then your chart won't have any points after 5. Maybe that doesn't matter in your use case though.

Another issue, as you indicated, it would be nice to be able to use some kind of linear interpolation to attribute the measurements in case the resolution of the requested plots is greater than the available data.

To solve the first of these, instead of selecting data purely from the sample table, we can join together the data with a generated series that matches the user's request. The latter can be generated using this:

SELECT int4range(rstart, rstart+1) AS srange 
FROM generate_series(0,10,1) AS seq(rstart)

The above query will generate a series of ranges, from 0 to 10 with a step size of 1. The output looks like this:

 srange
---------
 [0,1)
 [1,2)
 [2,3)
 [3,4)
 [4,5)
 [5,6)
 [6,7)
 [7,8)
 [8,9)
 [9,10)
 [10,11)
(11 rows)

We can join this to the data table, using the && operator (which filters on overlap).

The second point can be addressed by calculating the proportion of each data row which falls into each sample window.

Here is the full query:

SELECT lower(srange) AS t,
    sum (CASE 
        -- when data range is fully contained in sample range
        WHEN drange <@ srange THEN value
        -- when data range and sample range overlap, calculate the ratio of the intersection
        -- and use that to apportion the value
        ELSE CAST (value AS DOUBLE PRECISION) * (upper(drange*srange) - lower(drange*srange)) / (upper(drange)-lower(drange))
    END) AS value
FROM (
    -- Generate the range to be plotted (the sample ranges).
    -- To change the start / end of the range, change the 1st 2 arguments
    -- of the generate_series. To change the step size change BOTH the 3rd
    -- argument and the amount added to rstart (they must be equal).
    SELECT int4range(rstart, rstart+1) AS srange FROM generate_series(0,10,1) AS seq(rstart)
) AS s
LEFT JOIN (
    -- Note the use of the lag window function so that for each row, we get
    -- a range from the previous timestamp up to the current timestamp
    SELECT int4range(coalesce(lag(ts) OVER (order by ts), 0), ts) AS drange, value FROM data
) AS d ON srange && drange
GROUP BY lower(srange)
ORDER BY lower(srange)

Result:

 t  |      value
----+------------------
  0 |                5
  1 |                2
  2 | 3.33333333333333
  3 | 3.33333333333333
  4 | 3.33333333333333
  5 |
  6 |
  7 |
  8 |
  9 |
 10 |
(11 rows)

It is not likely any index will be used on ts in this query as it stands, and if the data table is large then performance is going to be dreadful.

There are some things you could try to help with this. One suggestion could be to redesign the data table such that the first column contains the time range of the data sample, rather than just the ending time, and then you could add a range index. You could then remove the windowing function from the second subquery, and hopefully the index can be used.

Read up on range types here.

Caveat Emptor: I have not tested this other than on the tiny data sample you supplied. I have used something similar to this for a somewhat similar purpose though.