Why is an unindexed range operator (<@) faster than using BETWEEN with an index?

https://dba.stackexchange.com/questions/285833

16-03-2021
|

Question

NB This is the same setup as this question, where here I'm asking specifically about something I was specifically not asking about over there.

I've got a table with a column utc timestamptz, with a "btree" index on the utc column:

CREATE TABLE foo(utc timestamptz)

CREATE INDEX ix_foo_utc ON foo (utc);

This table contains about 500 million rows of data.

When I filter utc using BETWEEN, the query planner uses the index as expected:

> EXPLAIN ANALYZE
SELECT
   utc
FROM foo
WHERE
    utc BETWEEN '2020-12-01' AND '2031-02-15'
;

Bitmap Heap Scan on foo  (cost=3048368.34..11836322.22 rows=143671392 width=8) (actual time=12447.905..165576.664 rows=150225530 loops=1)
  Recheck Cond: ((utc >= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (utc <= '2031-02-15 00:00:00+00'::timestamp with time zone))
  Rows Removed by Index Recheck: 543231
  Heap Blocks: exact=43537 lossy=1818365
  ->  Bitmap Index Scan on ix_foo_utc  (cost=0.00..3012450.49 rows=143671392 width=0) (actual time=12436.236..12436.236 rows=150225530 loops=1)
     Index Cond: ((utc >= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (utc <= '2031-02-15 00:00:00+00'::timestamp with time zone))
Planning time: 0.127 ms
Execution time: 172335.517 ms

I could write the same query using a range operator without an index at all:

> EXPLAIN ANALYZE
SELECT
   utc
FROM quotation.half_hour_data
WHERE
    utc <@ tstzrange('2020-12-01', '2031-02-15')
;

Gather  (cost=1000.00..9552135.30 rows=2556133 width=8) (actual time=0.179..145303.094 rows=150225530 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on foo  (cost=0.00..9295522.00 rows=1065055 width=8) (actual time=5.321..117837.452 rows=50075177 loops=3)
      Filter: (utc <@ '["2020-12-01 00:00:00+00","2031-02-15 00:00:00+00")'::tstzrange)
      Rows Removed by Filter: 120333718
Planning time: 0.069 ms
Execution time: 153384.494 ms

These are doing the same operation (albeit that <@ is right-hand exlusive and BETWEEN is inclusive.)

How can the unindexed query with <@ be faster than the indexed query with BETWEEN?

Surely if ignore an index is faster, the query planner should know that in advance?

Or is this something specific to do with the amount of memory my PG instance has, and the size of the query (big!)

My Postgres version:

"PostgreSQL 10.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit"

Solution

First, the index scan is slower than necessary, because your work_mem is not big enough to contain the bitmap. Increase it until you get no more “lossy” heap blocks during the bitmap heap scan.

The second plan is faster than the first because it uses two parallel workers (which the other plan cannot). But it uses way more resources: it keeps three processes busy for 117837 milliseconds (plus an additional 27465 milliseconds to collect the results), while the bitmap index scan is done in 153140 milliseconds.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange