Why does BETWEEN use a btree index but “element is contained by” range operator (<@) does not? [duplicate]

https://dba.stackexchange.com/questions/285497

16-03-2021
|

Question

I've got a table with a utc timestamptz which has a "btree" index on the utc column:

CREATE TABLE foo(utc timestamptz)

CREATE INDEX ix_foo_utc ON foo (utc);

This table contains about 500 million rows of data.

When I filter utc using BETWEEN, the query planner uses the index as expected:

> EXPLAIN ANALYZE
SELECT
   utc
FROM foo
WHERE
    utc BETWEEN '2020-12-01' AND '2031-02-15'
;

QUERY PLAN
Bitmap Heap Scan on foo  (cost=3048368.34..11836322.22 rows=143671392 width=8) (actual time=12447.905..165576.664 rows=150225530 loops=1)
Recheck Cond: ((utc >= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (utc <= '2031-02-15 00:00:00+00'::timestamp with time zone))
Rows Removed by Index Recheck: 543231
Heap Blocks: exact=43537 lossy=1818365
->  Bitmap Index Scan on ix_foo_utc  (cost=0.00..3012450.49 rows=143671392 width=0) (actual time=12436.236..12436.236 rows=150225530 loops=1)
Index Cond: ((utc >= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (utc <= '2031-02-15 00:00:00+00'::timestamp with time zone))
Planning time: 0.127 ms
Execution time: 172335.517 ms

But if I run the same query using a range operator, the index isn't used:

> EXPLAIN ANALYZE
SELECT
   utc
FROM quotation.half_hour_data
WHERE
    utc <@ tstzrange('2020-12-01', '2031-02-15')
;

QUERY PLAN
Gather  (cost=1000.00..9552135.30 rows=2556133 width=8) (actual time=0.179..145303.094 rows=150225530 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Seq Scan on foo  (cost=0.00..9295522.00 rows=1065055 width=8) (actual time=5.321..117837.452 rows=50075177 loops=3)
"Filter: (utc <@ '[""2020-12-01 00:00:00+00"",""2031-02-15 00:00:00+00"")'::tstzrange)
Rows Removed by Filter: 120333718
Planning time: 0.069 ms
Execution time: 153384.494 ms

I would have expected the query planner to realise that these are doing the same operation (albeit that <@ is right-hand exlusive and BETWEEN is inclusive.)

So why are these query plans so different? (Forget about asking why the sequential scan query completes more quickly??!!)

My Postgres version:

"PostgreSQL 10.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit"

Solution

An index can only support operators that belong to its operator class.

SELECT ao.amoplefttype::regtype,
       op.oprname,
       ao.amoprighttype::regtype
FROM pg_opfamily AS of
   JOIN pg_am AS am ON of.opfmethod = am.oid
   JOIN pg_amop AS ao ON of.oid = ao.amopfamily
   JOIN pg_operator AS op ON ao.amopopr = op.oid
WHERE am.amname = 'btree'
  AND ao.amoplefttype = 'timestamptz'::regtype;

       amoplefttype       | oprname |        amoprighttype        
--------------------------+---------+-----------------------------
 timestamp with time zone | <       | date
 timestamp with time zone | <=      | date
 timestamp with time zone | =       | date
 timestamp with time zone | >=      | date
 timestamp with time zone | >       | date
 timestamp with time zone | <       | timestamp without time zone
 timestamp with time zone | <=      | timestamp without time zone
 timestamp with time zone | =       | timestamp without time zone
 timestamp with time zone | >=      | timestamp without time zone
 timestamp with time zone | >       | timestamp without time zone
 timestamp with time zone | <       | timestamp with time zone
 timestamp with time zone | <=      | timestamp with time zone
 timestamp with time zone | =       | timestamp with time zone
 timestamp with time zone | >=      | timestamp with time zone
 timestamp with time zone | >       | timestamp with time zone
(15 rows)

There is no <@ operator in there, so a B-tree index cannot support this operator.

GIN indexes can support <@, but not with the constant on the right side.

You will have to rewrite the query and use BETWEEN.

Remark: There is not a fundamental reason why it is like that, it is just how indexes work in PostgreSQL. It is even possible to write an optimizer support function that would do exactly that, but your use case is so exotic that PostgreSQL doesn't want to spend optimizer time and development effort on it.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange