質問

I have a table like this:

+------------+------------------+
|temperature |Date_time_of_data |
+------------+------------------+
| 4.5        |9/15/2007 12:12:12|                  
| 4.56       |9/15/2007 12:14:16|
| 4.44       |9/15/2007 12:16:02|
| 4.62       |9/15/2007 12:18:23|
| 4.89       |9/15/2007 12:21:01|
+------------+------------------+

The data-set contains more than 1000 records and I want to check for the minimum variability. For every 30 minutes if the variance of temperature doesn't exceed 0.2, I want all the temperature values of that half an hour replaced by NULL.

役に立ちましたか?

解決

Here is a SELECT to get the start of a period for every record:

SELECT temperature,
       Date_time_of_data, 
       date_trunc('hour', Date_time_of_data)+
       CASE WHEN date_part('minute', Date_time_of_data) >= 30
            THEN interval '30 minutes'
            ELSE interval '0 minutes'
       END as start_of_period
FROM your_table

It truncates the date to hours (9/15/2007 12:12:12 to 9/15/2007 12:12:00) and then adds 30 minutes if the date initially had more than 30 minutes.

Next - use start_of_period to group results and get min and max for every group:

SELECT temperature,
       Date_time_of_data,
       max(Date_time_of_data) OVER (PARTITION BY start_of_period) as max_temp,
       min(Date_time_of_data) OVER (PARTITION BY start_of_period) as min_temp
FROM (previou_select_here)

Next - filter out the records, where the variance is more than 0.2

SELECT temperature,
       Date_time_of_data
FROM (previou_select_here)
WHERE (max_temp - min_temp) <=0.2

And finally update your table

UPDATE your_table
SET temperature = NULL
WHERE Date_time_of_data IN (previous_select_here)

You may need to correct some spelling mistakes in this queries, before they work. I havent tested them. And you can simplify them, if you need to.

P.S. If you need to filter out the data with variance less than 0.2 , you can simply create a VIEW from the third SELECT with

 WHERE (max_temp - min_temp) > 0.2

And use the VIEW instead of table.

他のヒント

This query should do the job:

with intervals as (
    select
        date_trunc('hour', Date_time_of_data) + interval '30 min' * round(date_part('minute', Date_time_of_data) / 30.0) as valid_interval
    from T
    group by 1
    having var_samp(temperature) > 0.2
)
select * from T
where 
    date_trunc('hour', Date_time_of_data) + interval '30 min' * round(date_part('minute', Date_time_of_data) / 30.0) in (select valid_interval from intervals)

The inner query (labeled as intervals) returns times when variance is over 0.2 (having var_samp(temperature) > 0.2). date_trunc ... expression rounds Date_time_of_data to half hour intervals.

The query returns nothing on the provided dataset.

create table T (temperature float8, Date_time_of_data timestamp without time zone);
insert into T values 
    (4.5, '2007-9-15 12:12:12'),
    (4.56, '2007-9-15 12:14:16'),
    (4.44, '2007-9-15 12:16:02'),
    (4.62, '2007-9-15 12:18:23'), 
    (4.89, '2007-9-15 12:21:01')
;
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top