Question

I have two timestamps in the table:

     usage_from      |     usage_till     
---------------------+--------------------
 2013-10-09 23:08:17 | 2013-10-09 23:16:00
 2013-10-09 23:08:17 | 2013-10-09 23:08:19
 2013-10-09 23:08:17 | 2013-10-10 18:58:22
 2013-10-09 23:08:17 | 2013-10-09 23:15:05
 2013-10-09 23:08:17 | 2013-10-09 23:09:00
 2013-10-09 23:08:17 | 2013-10-09 23:08:20
 2013-10-09 23:08:17 | 2013-10-09 23:32:04
 2013-10-09 23:08:17 | 2013-10-10 02:02:03
 2013-10-09 23:08:17 | 2013-10-10 07:31:00
 2013-10-09 23:08:17 | 2013-10-10 22:41:04

This I need to split into as below:

 usage_from      |     usage_till        
---------------------+-----------------------
 2013-10-09 23:08:17 |  2013-10-09 23:16:00 
 2013-10-09 23:08:17 |  2013-10-09 23:08:19 
 2013-10-09 23:08:17 |  2013-10-10 02:00:00 
 2013-10-10 02:00:00 |  2013-10-10 18:58:22 -- splitted
 2013-10-09 23:08:17 |  2013-10-09 23:15:05 
 2013-10-09 23:08:17 |  2013-10-09 23:09:00 
 2013-10-09 23:08:17 |  2013-10-09 23:08:20 
 2013-10-09 23:08:17 |  2013-10-09 23:32:04 
 2013-10-09 23:08:17 |  2013-10-10 02:00:00 
 2013-10-10 02:00:00 |  2013-10-10 02:02:03 -- splitted
 2013-10-09 23:08:17 |  2013-10-10 02:00:00 
 2013-10-10 02:00:00 |  2013-10-10 07:31:00 -- splitted
 2013-10-09 23:08:17 |  2013-10-10 02:00:00 
 2013-10-10 02:00:00 |  2013-10-10 22:41:04 -- splitted

Here in the above example I have split the timestamp at 02:00:00.

After many trials I could split it as below, but couldn't split as different row.

     usage_from      |     usage_till      |     end_time_1      |     end_time_2      
---------------------+---------------------+---------------------+---------------------
 2013-10-09 23:08:17 | 2013-10-09 23:16:00 | 2013-10-09 23:16:00 | 2013-10-11 02:00:00 
 2013-10-09 23:08:17 | 2013-10-09 23:08:19 | 2013-10-09 23:08:19 | 2013-10-11 02:00:00 
 2013-10-09 23:08:17 | 2013-10-10 18:58:22 | 2013-10-10 02:00:00 | 2013-10-10 18:58:22 
 2013-10-09 23:08:17 | 2013-10-09 23:15:05 | 2013-10-09 23:15:05 | 2013-10-11 02:00:00 
 2013-10-09 23:08:17 | 2013-10-09 23:09:00 | 2013-10-09 23:09:00 | 2013-10-11 02:00:00 
 2013-10-09 23:08:17 | 2013-10-09 23:08:20 | 2013-10-09 23:08:20 | 2013-10-11 02:00:00 
 2013-10-09 23:08:17 | 2013-10-09 23:32:04 | 2013-10-09 23:32:04 | 2013-10-11 02:00:00 
 2013-10-09 23:08:17 | 2013-10-10 02:02:03 | 2013-10-10 02:00:00 | 2013-10-10 02:02:03 
 2013-10-09 23:08:17 | 2013-10-10 07:31:00 | 2013-10-10 02:00:00 | 2013-10-10 07:31:00 
 2013-10-09 23:08:17 | 2013-10-10 22:41:04 | 2013-10-10 02:00:00 | 2013-10-10 22:41:04 

Any idea how to do this? I have been struggling for the last few days.
I'm using Redshift 1.0.757 (based on PostgreSQL 8.02).

Was it helpful?

Solution

If 1 Redshift would support the basic form of generate_series(), this might work. At least this works in Postgres 8.3:

SELECT CASE WHEN split > 0 AND g = 0 THEN usage_from
            WHEN split > 0 AND g = 1 THEN usage_till::date + '2:0'::time
            ELSE usage_from END
     , CASE WHEN split > 0 AND g = 0 THEN usage_till::date + '2:0'::time
            WHEN split > 0 AND g = 1 THEN usage_till
            ELSE usage_till END
FROM  (
   SELECT * , generate_series(0, split) AS g
   FROM  (
      SELECT *
            , (usage_till - '2:0'::time)::date
            - (usage_from - '2:0'::time)::date AS split  -- results in integer
      FROM   t
      ) sub1
   ) sub2

How?

  • In the inner subquery sub1 I find whether the time range crosses 2 a.m. and save that in the column split. I am assuming the time range never crosses 2 a.m. twice, but the query could easily be adapted to that. generate_series() automatically generates 1 row per wrap around.

  • In the next subquery sub2 generate_series() generates two rows where a split is needed.

  • In the outer SELECT a CASE statement adjusts the timestamps accordingly.

  • Normally I would use interval '2 hours' instead of '2:0'::time, but I seem to remember Redshift doesn't support the interval type.

SQL Fiddle for Postgres 8.3.

Not in Redshift?

If Redshift only allows generate_series() in the FROM list and not in the SELECT list, you are out of luck. This already is the ancient form. In modern Postgres, you would use a LATERAL JOIN. You could try your luck with regexp_split_to_table(), but that's not in Postgres 8.0 either.

1 But the manual says, generate_series() is unsupported.

Barring that, I can only think of a procedural solution with PL/pgSQL. But Redshift might be limited there, too ...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top