Postgresql: inner join takes 70 seconds

https://stackoverflow.com/questions/13088407

14-07-2021
|

Domanda

I have two tables -

Table A : 1MM rows, AsOfDate, Id, BId (foreign key to table B)

Table B : 50k rows, Id, Flag, ValidFrom, ValidTo

Table A contains multiple records per day between 2011/01/01 and 2011/12/31 across 100 BId's. Table B contains multiple non overlapping (between validfrom and validto) records for 100 Bids.

The task of the join will be to return the flag that was active for the BId on the given AsOfDate.

select 
    a.AsOfDate, b.Flag 
from 
    A a inner Join B b on 
        a.BId = b.BId and b.ValidFrom <= a.AsOfDate and b.ValidTo >= a.AsOfDate
where
    a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231

This query takes ~70 seconds on a very high end server (+3Ghz) with 64Gb of memory.

I have indexes on every combination of field as I'm testing this - to no avail.

Indexes : a.AsOfDate, a.AsOfDate+a.bId, a.bid Indexes : b.bid, b.bid+b.validfrom

Also tried the range queries suggested below (62seconds)

This same query on the free version of Sql Server running in a VM takes ~1 second to complete.

any ideas?

Postgres 9.2

Query Plan

QUERY PLAN                                       
---------------------------------------------------------------------------------------
Aggregate  (cost=8274298.83..8274298.84 rows=1 width=0)
->  Hash Join  (cost=1692.25..8137039.36 rows=54903787 width=0)
    Hash Cond: (a.bid = b.bid)
     Join Filter: ((b.validfrom <= a.asofdate) AND (b.validto >= a.asofdate))
     ->  Seq Scan on "A" a  (cost=0.00..37727.00 rows=986467 width=12)
           Filter: ((asofdate > 20110101) AND (asofdate < 20111231))
     ->  Hash  (cost=821.00..821.00 rows=50100 width=12)
           ->  Seq Scan on "B" b  (cost=0.00..821.00 rows=50100 width=12)

see http://explain.depesz.com/s/1c5 for the analyze output

here is the query plan from sqlserver for the same query

Soluzione 2

The issues was with the indexes - for some reason unclear to me, the indexes on the tables were not being referenced correctly by the query analyzer - i removed them all, added them back (exactly the same - via script) and the query now takes ~303ms.

thanks for all the help on this very frustrating problem.

Altri suggerimenti

Consider using the range types available in postgresql 9.2:

create index on a using gist(int4range(asofdate, asofdate, '[]'));
create index on b using gist(int4range(validfrom, validto, '[]'));

You can query for a date in a matching a range like so:

select * from a
where int4range(asofdate,asofdate,'[]') && int4range(20110101, 20111231, '[]');

And for rows in b overlapping a record in a like so:

select *
from b
    join a on int4range(b.validfrom,b.validto,'[]') @> a.asofdate
where a.id = 1

(&& means "overlaps", @>means "contains", and '[]' indicates to create a range that includes both end points)

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow