Question

We import currency rates into DB:

CREATE TABLE currency_rate (
    id int8 NOT NULL,
    date date NOT NULL,
    currency varchar(3) NOT NULL,
    rate numeric(12,6) NOT NULL,
    CONSTRAINT currency_rate_pk PRIMARY KEY (id)
);

ALTER TABLE currency_rate add constraint currency_rate_un UNIQUE (currency, date);

but actually we need only the latest available rate to work with.

It is cumbersome to write CTE with sort and distinct on (currency):

with cr as (
  select distinct on (currency) currency, rate from currency_rate
  order by currency, date)
select
  ...,
  sum((nd.original_amount - nd.new_amount)*cr.rate) as amount
from notification_data nd
join cr on cr.currency = nd.currency
...

Query has following execution plan that is nice:

  CTE cr
    ->  Result  (cost=0.28..69.66 rows=13 width=16)
          ->  Unique  (cost=0.28..69.66 rows=13 width=16)
                ->  Index Scan using currency_rate_un on currency_rate  (cost=0.28..67.17 rows=995 width=16)
  ...
              ->  Hash Join  (cost=1029.26..57129.68 rows=18 width=60)
                    Hash Cond: ((nd.currency)::text = (cr.currency)::text)

I created view:

CREATE OR REPLACE VIEW latest_currency_rate AS
SELECT
  DISTINCT ON (currency) currency, rate, date
FROM currency_rate
ORDER BY currency, date DESC;

but DB optimizer doesn't use index from currency_rate_un:

explain select * from latest_currency_rate;

Unique  (cost=60.83..65.38 rows=12 width=16)
  ->  Sort  (cost=60.83..63.10 rows=910 width=16)
        Sort Key: currency_rate.currency, currency_rate.date DESC
        ->  Seq Scan on currency_rate  (cost=0.00..16.10 rows=910 width=16)

and even for:

explain select * from latest_currency_rate where currency = 'USD';

Unique  (cost=16.87..17.13 rows=12 width=16)
  ->  Sort  (cost=16.87..17.13 rows=104 width=16)
        Sort Key: currency_rate.date DESC
        ->  Bitmap Heap Scan on currency_rate  (cost=5.08..13.38 rows=104 width=16)
              Recheck Cond: ((currency)::text = 'USD'::text)
              ->  Bitmap Index Scan on currency_rate_un  (cost=0.00..5.06 rows=104 width=0)
                    Index Cond: ((currency)::text = 'USD'::text)

Integration of new view to original query gives:

explain select
  sum((nd.original_amount - nd.new_amount)*cr.rate) as amount
from notification_data nd
join latest_currency_rate cr on cr.currency = nd.currency
...

...
 ->  Hash  (cost=73.54..73.54 rows=13 width=12)
       ->  Subquery Scan on cr  (cost=68.37..73.54 rows=13 width=12)
             ->  Unique  (cost=68.37..73.41 rows=13 width=16)
                   ->  Sort  (cost=68.37..70.89 rows=1008 width=16)
                         Sort Key: currency_rate.currency, currency_rate.date DESC
                         ->  Seq Scan on currency_rate  (cost=0.00..18.08 rows=1008 width=16)
...

Now I am puzzled. Why original CTE query uses Index Scan and view doesn't use the same index?

Should I rewrite view with some alternative trick (instead of distinct on)?

I am thinking about going with materialized view to avoid sequential scans...

Was it helpful?

Solution

Your view gives the correct answer, while your CTE gives the wrong answer, using the oldest date, not the newest.

If you want to use an index scan (although in my hands it doesn't actually make a difference in performance), specify DESC for both ORDER BY columns, or create an index for (currency, date DESC).

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top