How to create an index to speed up an aggregate LIKE query on an expression?

https://dba.stackexchange.com/questions/4521

16-10-2019
|

Question

I may be asking the wrong question in the title. Here are the facts:

My customer service folk have been complaining about slow response times when doing customer lookups on the administration interface of our Django-based site.

We're using Postgres 8.4.6. I started logging slow queries, and discovered this culprit:

SELECT COUNT(*) FROM "auth_user" WHERE UPPER("auth_user"."email"::text) LIKE UPPER(E'%deyk%')

This query is taking upwards of 32 seconds to run. Here's the query plan provided by EXPLAIN:

QUERY PLAN
Aggregate  (cost=205171.71..205171.72 rows=1 width=0)
  ->  Seq Scan on auth_user  (cost=0.00..205166.46 rows=2096 width=0)
        Filter: (upper((email)::text) ~~ '%DEYK%'::text)

Because this is a query generated by the Django ORM from a Django QuerySet generated by the Django Admin application, I don't have any control over the query itself. An index seems like the logical solution. I tried creating an index to speed this up, but it hasn't made a difference:

CREATE INDEX auth_user_email_upper ON auth_user USING btree (upper(email::text))

What am I doing wrong? How can I speed up this query?

Solution

There is no index support for LIKE / ILIKE in PostgreSQL 8.4 - except for left anchored search terms.

Since PostgreSQL 9.1 the additional module pg_trgm provides operator classes for GIN and GiST trigram indices supporting LIKE / ILIKE or regular expressions (operators ~ and friends). Install once per database:

CREATE EXTENSION pg_trgm;

Example GIN index:

CREATE INDEX tbl_col_gin_trgm_idx ON tbl USING gin (col gin_trgm_ops);

OTHER TIPS

That index isn't going to help because of the '%' at the start of your match - a BTREE index can only match prefixes and the wildcard at the start of your query means there is no fixed prefix to look for.

That's why it is doing a table scan and matching every record in turn against the query string.

You probably need to look at using a full text index and the text matching operators rather than doing the substring search with LIKE that you are at the moment. You can find more on full text searching in the documentation:

http://www.postgresql.org/docs/8.4/static/textsearch-intro.html

In fact I notice from that page that LIKE apparently never uses indexes, which seems odd to me as it ought to be able to resolve non-wildcard prefixes using a BTREE index. A few quick tests suggests that the documentation is probably correct however, in which case no amount of indexing is going to help while you are using LIKE to resolve the query.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange