Well, to start with, you have two completely different sets of requirements. These two things aren't the same.
- 5 minutes before and 5 minutes after each hour.
- 2 observations before and 2 observations after each hour.
The second one relies on your sampling never failing to be within the two-minute sample intervals. I would not only bet that someday your sampling will fail that two-minute interval, I'll bet it already has. Sampling might fail for many reasons, but 5-minute intervals of time can never fail to be 5-minute intervals of time. Use the first one, not the second one.
This query generates one bucket per hour, and averages all the readings that fall within 5 minutes of the hour. It includes both endpoints. That is, for 16:00, it includes both 15:55 and 16:05.
with buckets as (
select '00:00'::time + (n || ' hours')::interval bucket
from generate_series(0,23) n
)
select b.bucket, avg(t.observation) obs_avg
from buckets b
inner join test t
on test_ts::time between (bucket - interval '5' minute)
and (bucket + interval '5' minute)
group by bucket;
bucket obs_avg
--
16:00:00 5.7
17:00:00 5.82
If your values aren't normally distributed--and they don't appear to be normally distributed--you should use the median instead of average. Median is a better indication of central tendency when your values aren't normally distributed. You'll need to create a function for that; PostgreSQL doesn't have a native median function.
My tables name is s_25 and time-date table is dt and variable is ambtemp
Based on that late comment, and assuming you meant your timestamp column is dt, here's my best guess.
with buckets as (
select '00:00'::time + (n || ' hours')::interval bucket
from generate_series(0,23) n
)
select b.bucket, avg(t.ambtemp) obs_avg
from buckets b
inner join s_25 t
on dt::time between (bucket - interval '5' minute)
and (bucket + interval '5' minute)
group by bucket
If you're going to run this query over multiple dates, joining on time alone won't be sufficient. Use this instead. The CTEs "even_hours" and "obs_dates" return the pieces that are cross-joined in "bucket_midpoints" to return all the hours for the dates you're interested in. The WHERE clause in "obs_dates" keeps you from generating too many rows.
with even_hours as (
select '00:00'::time + (n || ' hour')::interval as obs_time
from generate_series(0, 23) n
),
obs_dates as (
select distinct (dt)::date obs_date
from s_25
-- *This* WHERE clause keeps you from generating too many rows
-- for the join below. You *could* select the min and max dates
-- from your table, but you'd probably get too many rows that way.
where dt between '2007-09-01' and '2007-09-30'
),
bucket_midpoints as (
select (obs_dates.obs_date || ' ' || even_hours.obs_time)::timestamp as bucket_midpoint
from obs_dates, even_hours
)
select (bucket_midpoint - interval '5' minute) bucket_start,
(bucket_midpoint + interval '5' minute) bucket_end,
avg(s_25.ambtemp)
from bucket_midpoints
left join s_25
on s_25.dt between (bucket_midpoint - interval '5' minute)
and (bucket_midpoint + interval '5' minute)
group by bucket_start, bucket_end
order by bucket_start, bucket_end