How can I centralize a complex windowing query used repeatedly but with different partitions each time

StackOverflow https://stackoverflow.com/questions/23352285

Question

I have some range collapsing logic (based on http://wiki.postgresql.org/wiki/Range_aggregation) which I want to re-use over a variety of different column partitions.

Right now I'm accomplishing this using PHP. I have a function similar to the following which returns the query I want to run with the relevant columns replaced:

function getIntervalsQueryForPartition($partitions = array())
{
// ... there is some validation logic here, not relevant to question

$cols = implode(', ', $partitions) . ' ';

return <<<SQL
SELECT $cols, MIN(start_date) start_date, MAX(end_date) end_date
FROM (
  SELECT $cols, start_date, end_date,
    MAX(new_start) OVER (
      PARTITION BY $cols
      ORDER BY start_date, end_date
    ) AS left_edge
  FROM (
    SELECT $cols, start_date, end_date,
    CASE WHEN GREATEST(
        MIN(start_date) OVER (
          PARTITION BY $cols
          ORDER BY start_date, end_date
          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ),
        start_date - INTERVAL '90 days'
    ) < (
    MAX(end_date) OVER (
        PARTITION BY $cols
        ORDER BY start_date, end_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
      )
    )
    THEN NULL
    ELSE start_date
    END AS new_start
    FROM product_activity
  ) s1
) s2
GROUP BY $cols, left_edge
SQL;
}

Ultimately there are many different column partitions on product_activity which I wish to perform this same windowing and aggregation over. Obviously I'd prefer not to just copy and paste the query all over the place with slightly different partitions: thus the PHP function above.

How can I accomplish the same abstraction entirely within postgres? Can this be done with a stored procedure? I'd like a dba to be able to invoke this query for different partitions without having to copy-and-paste it and then edit all 7 places where the columns are specified.

Was it helpful?

Solution

You can write a function as in PHP. Because of specific pl/pgSQL restrictions the simplest option is to write a function with one text parameter and returning setof record.

create or replace function func (cols text)
returns setof record language plpgsql as $$
begin
    return query execute format (
        'SELECT %s, MIN(start_date) start_date, MAX(end_date) end_date
        FROM (
          SELECT %s, start_date, end_date,
            MAX(new_start) OVER (
              PARTITION BY %s
              ORDER BY start_date, end_date
            ) AS left_edge
          FROM (
            SELECT %s, start_date, end_date,
            CASE WHEN GREATEST(
                MIN(start_date) OVER (
                  PARTITION BY %s
                  ORDER BY start_date, end_date
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                ),
                start_date - INTERVAL ''90 days''
            ) < (
            MAX(end_date) OVER (
                PARTITION BY %s
                ORDER BY start_date, end_date
                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
              )
            )
            THEN NULL
            ELSE start_date
            END AS new_start
            FROM product_activity
          ) s1
        ) s2
        GROUP BY %s, left_edge',
        cols, cols, cols, cols, cols, cols, cols);
end $$;

The only disadvantage of this method is a way in which you call the function - it must be casted to explicit composite type.

select * from func('a1, a2')
as (a1 int, a2 int, start_date date, end_date date);

select * from func('a1, a3, a5')
as (a1 int, a3 int, a5 int, start_date date, end_date date);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top