Overriding cost parameters
You can't ever force PostgreSQL to use a particular index, or totally prevent it from doing a seqscan.
However, you can tell it to avoid doing certain scan types if it possibly can, by setting the relevant enable_
parameters to off
. It's really a feature intended only for debugging.
For testing, try:
SET enable_seqscan = off;
if Pg can possibly use an index scan (or something else) it will.
You may also want to consider:
SET random_page_cost = 1.1
i.e. tell PostgreSQL that random I/O is only slightly more expensive than sequential I/O. This is usually true on systems with SSDs, or where most of the DB is cached in RAM. It will be more likely to choose an index in this case.
Of course, if your system's random I/O is actually more expensive, then using an index is likely to be slower.
Selectivity, partial indexes
What you should really do is follow the advice you've already been given. Create the index in order of selectivity - if relevant
is less common, use that. You can even go a step further and create a partial index:
CREATE INDEX idx_name_blah ON tbl_name_blah (factory_key) WHERE (NOT relevant);
This index only contains values for relevant = 'f'
. It can only be used for queries where the planner knows relevant will be false. On the other hand, it will be a much smaller, faster index.
Statistics
You might also have inaccurate statistics, causing PostgreSQL to think value frequencies are different than they really are for your table. explain analyze
will help show this.
You can also just ANALYZE my_table
in case the stats are just out of date; if so, increase the frequency with which autovacuum runs because it's not keeping up.
If the stats are up to date but the planner is still making stats-based mis-estimations, increasing the statistics target for the table (see manual) and re-analyzing can help if it is actually a statistics mis-estimation problem.
Versions
Older PostgreSQL versions tend to be less smart about cost estimation, query optimization, statistics, query execution methods, and pretty much everything else.
If you're not on the latest version, upgrade.
For example, 9.2's index-only scans would allow you to create a partial index
(product_id, factory_key) WHERE (NOT relevant)
and then run a query:
SELECT product_id, factory_key FROM my_table WHERE NOT relevant;
that should only read the index, with no heap access at all.