Domanda

I've worked on "Seven Databases in Seven Weeks" recently, and when implementing the the task with building a pivot table using crosstab() - the task was already described here: https://stackoverflow.com/questions/34665186/creating-a-calendar-like-table-using-crosstab-in-postgres?rq=1

I faced kind of PostgreSQL "magic" for ORDER BY clause.

The right query is:

SELECT * FROM crosstab (
  'SELECT extract(week from starts) AS week,
     extract(dow from starts) AS day, count(*) 
   FROM events 
   GROUP BY week, day 
   ORDER BY week',
  'SELECT generate_series(0,6)'
) AS (
week int, 
Sunday int, Monday int, Tuesday int, 
Wednesday int, Thursday int, Friday int, Saturday int
) ORDER BY week;

The data in the table the following:

enter image description here

It will return this data:

enter image description here

If I were to omit the ORDER BY week in first query in crosstab() function, it's will return 12 rows instead of 10.

SELECT * FROM crosstab (
  'SELECT extract(week from starts) AS week,
     extract(dow from starts) AS day, count(*) 
   FROM events 
   GROUP BY week, day',
  'SELECT generate_series(0,6)'
) AS (
week int, 
Sunday int, Monday int, Tuesday int, 
Wednesday int, Thursday int, Friday int, Saturday int
) ORDER BY week;

Output:

enter image description here

There are three strange things here:

  • first is that when to run EXPLAIN VERBOSE on both of the queries, it show the completely identical QUERY PLAN except ORDER BY clause;
  • when to run the sub-queries separately it will return 12 rows for both;
  • ORDER BY clause should impact on the order only.
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Sort  (cost=59.83..62.33 rows=1000 width=32)
   Output: week, sunday, monday, tuesday, wednesday, thursday, friday, saturday
   Sort Key: crosstab.week
   ->  Function Scan on public.crosstab  (cost=0.00..10.00 rows=1000 width=32)
         Output: week, sunday, monday, tuesday, wednesday, thursday, friday, saturday
         Function Call: crosstab('SELECT extract(week from starts) AS week,
      extract(dow from starts) AS day, count(*) 
    FROM events 
    GROUP BY week, day 
    ORDER BY week'::text, 'SELECT generate_series(0,6)'::text)
(10 rows)
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Sort  (cost=59.83..62.33 rows=1000 width=32)
   Output: week, sunday, monday, tuesday, wednesday, thursday, friday, saturday
   Sort Key: crosstab.week
   ->  Function Scan on public.crosstab  (cost=0.00..10.00 rows=1000 width=32)
         Output: week, sunday, monday, tuesday, wednesday, thursday, friday, saturday
         Function Call: crosstab('SELECT extract(week from starts) AS week,
      extract(dow from starts) AS day, count(*) 
    FROM events 
    GROUP BY week, day'::text, 'SELECT generate_series(0,6)'::text)
(9 rows)

How to explain this behvaior? The closest question was this, but it's not the same I think: Why does changing the sort order return a different number of results? Is it a bug?

Thanks in advance.

È stato utile?

Soluzione

Omitting the ORDER BY goes against what the documentation recommends:

In practice the source_sql query should always specify ORDER BY 1 to ensure that values with the same row_name are brought together

This ORDER BY is a requirement of the algorithm inside crosstab() because it's counting on identical values being contiguous. What happens inside crosstab() is a black blox as far as the executor of the main query is concerned. It doesn't know and doesn't care that this algorithm is sensitive to the order of the rows inside crosstab. To start with, it doesn't know that the arguments passed to crosstab() are queries. These queries are executed by a different instance of the executor, as dynamic SQL. It seems that your reasoning in the question is ignoring that, which is why you see properties in this query that are unusual in normal SQL queries, which don't involve embedding dynamic SQL.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top