Question

I was doing some performance benchmarking on some of my company's SQL, comparing PG10 to PG12. We use a lot of CTEs in our code, and PG12 didn't natively optimize the CTEs, so the performance was the same between PG10 and PG12.

My next experiment was to add the NOT MATERIALIZED directive to the CTEs and the result was astounding: it cut query times dramatically (halved them in some cases).

I read here that MATERIALIZED was the default functionality prior to PG12. And that functionality would write all the contents of the CTE into a temporary location.

So my question is mainly around NOT MATERIALIZED:

  1. What does the NOT MATERIALIZED functionality do with the data behind the scenes in contrast to MATERIALIZED?
  2. Are there any side effects to NOT MATERIALIZED I should be aware of before refactoring our codebase?
Was it helpful?

Solution

It's explained very well in the documentation.

A useful property of WITH queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects.

So far, so good, BUT:

However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced WITH query, since that might affect all uses of the WITH query's output when it should affect only one. The multiply-referenced WITH query will be evaluated as written, without suppression of rows that the parent query might discard afterwards.

So, as pointed out in the example given, if you have a query like this:

WITH w AS (
    SELECT * FROM big_table  -- big_table has an INDEX on a field called key!
)
SELECT * FROM w AS w1 
  JOIN w AS w2 ON w1.key = w2.ref  -- w is called TWICE, so DEFAULT is MATERIALIZED
                                   -- PostgreSQL can't take advantage of big_table.key
WHERE w2.key = 123;

So, in this case:

the WITH query will be materialized, producing a temporary copy of big_table that is > then joined with itself — without benefit of any index

Far better to have:

WITH w AS NOT MATERIALIZED (
    SELECT * FROM big_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
WHERE w2.key = 123;

So that the optimizer can "fold" the CTE query "into" the main query and make use of the INDEX on the key field of big_table!

Re. the DEFAULT of NOT MATERIALIZED:

However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once.

So the DEFAULT is NOT MATERIALIZED if:

    the_query IS NOT recursive 
AND the_query is_side_effect_free 
AND the_query is_run_only_once

otherwise you have to tell PostgreSQL to use NOT MATERIALIZED.

The only small problem that I see is that testing will be required to see if NOT MATERIALIZED is an improvement or not? I can see circumstances where the balance will swing between the two depending on table size, fields selected and indexes on the fields and tables used in the CTE - in other words, there's no substitute for knowledge and experience. The DBA isn't dead and gone yet! :-)

OTHER TIPS

The only side effect should be performance-related (which I guess makes it the main effect, not a side effect). If there is any other side effects, those would have to be bugs. It is bit weird that that there is no "let the planner decide" setting.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top