Question

When joining two tables via a composite (two column) primary key, I get bad cardinality estimates in the query plan. Example:

CREATE TABLE t1 AS SELECT x, x*2 AS x2 FROM generate_series(0, 1000) AS x;
ALTER TABLE t1 ADD PRIMARY KEY(x, x2);
ANALYZE t1;

CREATE TABLE t2 AS SELECT x, x*2 AS x2 FROM generate_series(0, 1000) AS x;
ALTER TABLE t2 ADD FOREIGN KEY (x, x2) REFERENCES t1(x,x2);
ANALYZE t2;

EXPLAIN ANALYZE
SELECT *
FROM t1 JOIN t2 USING (x, x2)

 QUERY PLAN                                                                                                    
 ------------------------------------------------------------------------------------------------------------- 
 Hash Join  (cost=30.02..52.55 rows=1 width=8) (actual time=0.660..1.551 rows=1001 loops=1)                    
   Hash Cond: ((t1.x = t2.x) AND (t1.x2 = t2.x2))                                                              
   ->  Seq Scan on t1  (cost=0.00..15.01 rows=1001 width=8) (actual time=0.021..0.260 rows=1001 loops=1)       
   ->  Hash  (cost=15.01..15.01 rows=1001 width=8) (actual time=0.620..0.620 rows=1001 loops=1)                
         Buckets: 1024  Batches: 1  Memory Usage: 40kB                                                         
         ->  Seq Scan on t2  (cost=0.00..15.01 rows=1001 width=8) (actual time=0.019..0.230 rows=1001 loops=1) 
 Total runtime: 1.679 ms    

The plan expects one returned row, but in fact 1001 rows are returned. This is not a problem in simple queries, but it leads to very slow query plans when doing complex queries. How can I help the query optimizer to do better?

Was it helpful?

Solution

Using a composite primary key in which one column is completely dependent on the other column is an "interesting" design.

In any case, PostgreSQL currently assumes each column's selectivity is independent of each other and so multiplies them together (regardless of whether they are in the same index or not, even when it is a primary key index), and I don't know of a good way around that.

You can use this circumlocution to get closer to the true selectivity:

EXPLAIN ANALYZE
SELECT *
FROM t1 JOIN t2 on (t1.x=t2.x and t1.x2 between t2.x2 and t2.x2);

OTHER TIPS

Another way to create truly orthogonal key-elements:

CREATE TABLE t1 AS SELECT x/100 AS x, x%100 AS x2 FROM generate_series(0, 10000) AS x;
ALTER TABLE t1 ADD PRIMARY KEY(x, x2);
ANALYZE t1;

CREATE TABLE t2 AS SELECT x/100 AS x, x%100 AS x2 FROM generate_series(0, 10000) AS x;
ALTER TABLE t2 ADD PRIMARY KEY (x, x2) ; -- added PK
ALTER TABLE t2 ADD FOREIGN KEY (x, x2) REFERENCES t1(x,x2);

ANALYZE t2;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top