Question

I have a (possibly) basic question about how Postgres executes queries containing WITH clauses. I'm wondering whether including extraneous tables in a WITH clause actually slows down the query. That is, if the "temporary" table created in a WITH clause is never called outside of a WITH clause, is that "temporary" table actually created?

In the first example, I am joining two "temporary" tables created using WITH clauses:

--Example 1
WITH temp1 as (
SELECT * from table_1
),
temp2 as (
select * from table_2
)
select * 
from temp1
join temp2;

In the second example, I'm doing the exact same query except there is an extraneous table "temp3" created in the WITH clause.

--Example 2
WITH temp1 as (
SELECT * from table_1
),
temp2 as (
select * from table_2
),
temp3 as (
select * from table_3
)
select * 
from temp1
join temp2;

Is there any performance difference between these two queries? If table_3 is a huge table, will this slow down the query in example 2 vs. example 1? If not, why not?

It seems like it does not affect the query time. I'm still curious as to why, though ...

Was it helpful?

Solution

you can use Explain to show how the query optimizer will handle your query.

http://www.postgresql.org/docs/9.2/static/sql-explain.html

In the case above PSQL should see that temp3 is not used and not include it.

using your example above on one my dbs.

explain with temp1 as (select * from cidrs), temp2 as (select * from contacts), temp3 as ( select * from accounts )  select * from temp1 join temp2 on temp1.id = temp2.id;
                             QUERY PLAN
---------------------------------------------------------------------
 Hash Join  (cost=22.15..25.44 rows=20 width=4174)
   Hash Cond: (temp1.id = temp2.id)
   CTE temp1
     ->  Seq Scan on cidrs  (cost=0.00..11.30 rows=130 width=588)
   CTE temp2
     ->  Seq Scan on contacts  (cost=0.00..10.20 rows=20 width=3586)
   ->  CTE Scan on temp1  (cost=0.00..2.60 rows=130 width=588)
   ->  Hash  (cost=0.40..0.40 rows=20 width=3586)
         ->  CTE Scan on temp2  (cost=0.00..0.40 rows=20 width=3586)
(9 rows)

you will notice no mention of temp3. In answering your edit, about why it doesn't affect query time, the optimizer is smart enough to see that it isn't used and doesn't bother computing it. Hence the reason it is an optimizer.

OTHER TIPS

You got the main answer from @Doon already.

Since you are interested in performance, note that subqueries are typically faster than CTEs in most cases. Common Table Expressions (WITH queries) pose as optimization barriers. Read this thread on pgsql-performance for details.

Use CTEs ...

  • .. if a subquery is used multiple places, to avoid repeated execution.
  • .. to keep the optimizer from trying to combine subquery and main query for some reason (performance, avoid influence on side effects from functions).
  • .. to partition complex queries (readability, maintainability)
  • .. for RECURSIVE queries.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top