Question

I have a table with several million records in a PostgreSQL 12 database, and after an upgrade from 11 to 12, a few queries started performing horribly. They went from taking ~1 second to ~5 minutes. I tried rebuilding all indexes, vacuuming, and all usual Postgres low-hanging fruit, but performance is still terrible.

This is the query:

SELECT id, activity_count
FROM user
WHERE (search_index) @@ (to_tsquery('pg_catalog.english', '''1234567890'':*') AND active = true
ORDER BY activity_count DESC LIMIT 101

In other words, find all active users matching the given account number, and sort from most active to least.

This query takes about 5 minutes to return just 2 records. Something's not right.

The column search_index is a tsvector storing all the keywords from the table's various text fields (just things like account_number, name, etc).

I have a GIN index created for this column with:

CREATE INDEX user_search_index_gin
    ON public.user USING gin
    (search_index)
    TABLESPACE pg_default;

I also have an index for the active column with:

CREATE INDEX user_active
    ON public.user USING btree
    (active ASC NULLS LAST)
    TABLESPACE pg_default;

And I have an orderd index for the activity_count with:

CREATE INDEX user_activity_count
    ON public.user USING btree
    (activity_count ASC NULLS LAST)
    TABLESPACE pg_default;

Yet when I run EXPLAIN, I get:

"Limit  (cost=0.56..11443.66 rows=101 width=1552)"
"  ->  Index Scan Backward using user_activity_count on user  (cost=0.56..36010185.91 rows=317836 width=1552)"
"        Filter: (active AND (search_index @@ '''1234567890'':*'::tsquery))"

Why is it only using the user_activity_count index and not the more efficient GIN index? How do I fix this?

Was it helpful?

Solution

Which index to use?

While your WHERE conditions are not very selective, the current query plan makes a lot of sense with ORDER BY activity_count DESC LIMIT 101. See:

However, your predicates strike me as pretty selective:

WHERE (search_index) @@ (to_tsquery('pg_catalog.english', '''1234567890'':*') AND active = true

That's unless your example '1234567890' is misleading. (Are you sure you want the single quotes in the search term?) It's obviously not the actual query you used. Unmatched parentheses. In any case, you probably want the "simple" text search configuration (plus some improvements):

WHERE  search_index @@ to_tsquery('pg_catalog.simple', '''1234567890'':*')
AND    active

See:

Statistics

So it seems your statistics are not up to date.

after an upgrade from 11 to 12, a few queries starting performing horribly. They went from taking ~1 second to ~5 minutes. I tried rebuilding all indexes, vacuuming, and all usual Postgres low-hanging fruit, but performance is still terrible.

Why is it only using the user_activity_count index and not the more efficient GIN index?

Did you also add ANALYZE to your basket of all usual Postgres low-hanging fruit? You certainly didn't mention it. Consider instructions in the manual:

  1. Statistics

    Because optimizer statistics are not transferred by pg_upgrade, you will be instructed to run a command to regenerate that information at the end of the upgrade. You might need to set connection parameters to match your new cluster.

If your DB is big, you may be interested in vacuumdb with the option --analyze-in-stages. (That ship may have sailed for the case at hand.) The manual:

This option is useful to analyze a database that was newly populated from a restored dump or by pg_upgrade. This option will try to create some statistics as fast as possible, to make the database usable, and then produce full statistics in the subsequent stages.

Related:

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top