Postgresql - using index to filter rows in table with above 100mln rows

https://stackoverflow.com/questions/13846704

07-12-2021
|

Question

I have table with above 100mln rows. I have to count, and extract rows as in following query. The query runs verly long. Explain shows that query doesn't use b-tree index which is created on "created_date" column. I found on stackoverflow some explanation, that b-trees indicies are useless to filter when table has many rows.

There is an advice to Cluster Index. Should i Cluster table on "created_date" index, if i also use often query, where i ORDER BY id?

What would you advice me to faster queries? Maybe should i read more about sharding?

 explain SELECT count(r.id) FROM results_new r
 WHERE r.searches_id = 4351940 AND (created_date between '2008-01-01'  and '2012-12-13') 


Limit  (cost=1045863.78..1045863.79 rows=1 width=4)
   ->  Aggregate  (cost=1045863.78..1045863.79 rows=1 width=4)
    ->  Index Scan using results_new_searches_id_idx on results_new r  (cost=0.00..1045012.38 rows=340560 width=4)"
          Index Cond: (searches_id = 4351940)"
          Filter: ((created_date >= '2008-01-01 00:00:00'::timestamp without time zone) AND (created_date <= '2012-12-13 00:00:00'::timestamp without time zone))

Solution

From the look of it, the database has decided that a lookup for one searches_id will produce fewer rows to go through than a lookup for the created_date range. (and that combining the result of two index scans with a bitmap isn't worthwhile...)

If you need this query often, then consider creating an index on searches_id, created_date and then both conditions should go into the index condition.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow