Question

When I run the SQL Query:

 select generate_series(0,g)
 from ( select date(date1) - date(date2) as g from mytable ;

It returns an error:

 INFO:  Function "generate_series(integer,integer)" not supported.
 ERROR:  Specified types or functions (one per INFO message) not supported 
 on Redshift tables.

But when I run this query:

select  generate_series(0, g) from (select 5 as g)

It returns the below response:

 generate_series
-----------------
 0
 1
 2
 3
 4
 5
(6 rows)

Why does the second query work, while the first fails?

Was it helpful?

Solution

The generate_series() function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide:

In the specific examples, the second query is executed entirely on the leader node as it does not need to scan any actual table data, while the first is trying to select data and as such would be executed on the compute node(s).

UPDATE:

generate_series is working with Redshift now.

SELECT CURRENT_DATE::TIMESTAMP  - (i * interval '1 day') as date_datetime 
FROM generate_series(1,31) i 
ORDER BY 1

This will generate date for last 30 days

OTHER TIPS

You can use a window function to achieve a similar result. This requires an existing table (like stv_blocklist) to seed off that has at least the number of rows you need but not too many which might slow things down.

with days as (
    select (dateadd(day, -row_number() over (order by true), sysdate::date)) as day 
    from [other_existing_table] limit 30
)
select day from days order by 1 asc

You can use this method to get other time ranges as well for bucketing purposes. This version generates all the minutes for the previous day so you could do a left join against it and bucket your data.

with buckets AS (
    select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute 
    from [other_table] limit 1440
)
select minute from buckets order by 1 asc

I may have first seen this here.

You will need to use functions that are supported by the leader node. The trick is to use the row_number() function from any table that you want. Let's say that we want to generate a date-series from 10 days ago up tp now:

   SELECT DATEADD('day', -n, (CURRENT_DATE+1)) AS generated_date
   FROM (SELECT ROW_NUMBER() OVER () AS n FROM my_table LIMIT 10) n
   ORDER BY generated_date DESC

And we get:

generated_date
2020-06-24 00:00:00
2020-06-23 00:00:00
2020-06-22 00:00:00
2020-06-21 00:00:00
2020-06-20 00:00:00
2020-06-19 00:00:00
2020-06-18 00:00:00
2020-06-17 00:00:00
2020-06-16 00:00:00
2020-06-15 00:00:00

You are correct that this does not work on Redshift. See here.

You could use something like this

with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
,generted_numbers AS
(
    SELECT (1000*t1.num) + (100*t2.num) + (10*t3.num) + t4.num-5000 as gen_num
    FROM ten_numbers AS t1
      JOIN ten_numbers AS t2 ON 1 = 1
      JOIN ten_numbers AS t3 ON 1 = 1
      JOIN ten_numbers AS t4 ON 1 = 1
)
select  gen_num from generted_numbers
where gen_num between -10 and 0
order by 1;

You are not using PostgreSQL. You are using Amazon Redshift.

Amazon Redshift does not support generate_series when used with Redshift tables. It says it right there in the error message.

Either use real PostgreSQL, or if you need Redshift's features, you must also work within the limitations of Redshift.

Your second example works because it does not use any Redshift tables.

Why it's not working was explained above. Still, the question "what can we do about this?" is open.

If you develop a BI system on any platform (with generators supported or not), it is very handy to have dimension tables with sequences of numbers and dates. How can you create one in Redshift?

  1. in Postgres, produce the necessary sequence using generator
  2. export to CSV
  3. create a table with the same schema in Redshift
  4. import the CSV from Step 2 to Redshift

Imagine you have created a very simple table called calendar:

 id, date
 1, 2017-01-01
 2, 2017-01-02
 ..., ...
 xxx, 2020-01-01

So your query will look like this:

SELECT t.id, t.date_1, t.date_2, c.id as date_id, c.date
FROM mytable t
JOIN calendar c
ON c.date BETWEEN t.date_1::date AND t.date_2::date
ORDER BY 1,4

In calendar table you can also have first dates of week, month, quarter, weekdays (Mon,Tue,etc.), which makes such table super effective for time-based aggregations.

I've found in practice that counting rows off an existing table is too slow for my uses (and introduces a dependency).

Combining the "count rows" answer with Jon Scott's answer above we get this solution that we use at my company:

with n as (
    select 1 union select 2 union select 3 union select 4 union select 5 union
    select 6 union select 7 union select 8 union select 9 union select 10
  )
  select row_number() over () as n
  from n n1, n n2, n n3, n n4, n n5, n n6

This works here (pg-9.3.3) Maybe your issue is just the result of a Redshift-"feature"?

CREATE TABLE mytable
        ( date1 timestamp
        , date2 timestamp
        );
INSERT INTO mytable(date1,date2) VALUES
( '2014-03-30 12:00:00' , '2014-04-01 12:00:00' );

SELECT  generate_series(0, ss.g) FROM
   ( SELECT date(date2) - date(date1) AS g
     FROM mytable
   ) ss ;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top