Question

I have two huge tables:

tbl_a
id(int), cnt(int default:0)

tbl_b
id(int), a_id(int)

I neet to count all rows with the same a_id in tbl_b and put that value into tbl_a.

Here I found the way:

update tbl_a
set cnt = tb.c
from (select count(*) c,a_id from tbl_b group by a_id) tb
where tb.a_id = tbl_a.id

But that query works for about 8s per 1000 rows. It's not acceptable, cause I have about 6M ones.

Tried to create temp table

... AS (select count(*),a_id from tbl_b group by a_id)

and even added b-tree index, but nothing changes.

Can that perform faster on the same hardware?

UPD1:

"Update on tbl_a  (cost=0.00..343357.80 rows=40000 width=459)"
"  ->  Nested Loop  (cost=0.00..343357.80 rows=40000 width=459)"
"        ->  Seq Scan on tbl_b_temp  (cost=0.00..617.00 rows=40000 width=18)"
"        ->  Index Scan using tbl_a_pkey on tbl_a  (cost=0.00..8.55 rows=1 width=445)"
"              Index Cond: (id = tbl_b_temp.a_id)"

rows=40000 because i created smaller temp table.

Query:

create temporary table tbl_b_temp as 
select count(*) as c, a_id from tbl_b group by a_id order by a_id limit 40000;

CREATE INDEX a_id_ind on tbl_b using btree (a_id);
Was it helpful?

Solution

When working with a (big) temporary table, be sure to run ANALYZE after creating or changing it, since autovacuum does not cover temporary tables. This may improve the query plan Postgres comes up with.

Quoting the manual:

Temporary tables cannot be accessed by autovacuum. Therefore, appropriate vacuum and analyze operations should be performed via session SQL commands.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top