Question

I'm trying to create an index that will support queries that use my custom operator. This is on Postgresql 10.4

The custom operator

I followed the tips in this SO answer to create an operator that performs "LIKE" style matching on elements in an text ARRAY.

CREATE FUNCTION reverse_like (text, text) returns boolean language sql 
as $$ select $2 like $1 $$;

CREATE OPERATOR <~~ ( function =reverse_like, leftarg = text, rightarg=text );

The above operator allows me to do things like

SELECT 'ab%' <~~ ANY('{"abc","def"}');

The schema, index and query

I have a table with web traffic visits called sessions which includes an array column.

CREATE TABLE sessions
(
   session_id    varchar(24) NOT NULL,
   first_seen    timestamp,
   domains       varchar[]
);

To query the domains column to see if a given domain (or partial/ wildcarded domain name) was visited I can do the following:

SELECT count(*)
FROM session_4070ba14_f081_41cb_9ef7_9dd385934da7
WHERE 'www.foo%' <~~ ANY(domains);

I want to speed up the above queries with GIN index. So I created the index as follows:

CREATE INDEX idx_domains ON session USING GIN(domains);

The Question

After running analyze on the table and a set enable_seqscan = false; I have no luck getting postgres to employ this index. It's always doing a seqscan. It uses the above index of array operators like @> but not for my custom <~~ operator.

I think its because the GIN index doesn't know how to handle my custom operator - so do I need to create an operator class and then create my index using that? Or do I create a functional index?

Was it helpful?

Solution 3

Turns out I over complicated this by thinking about a GIN index. A b-tree index on the whole array works fine and supports the custom <~~ operator.


CREATE INDEX IF NOT EXISTS idx_domains2 ON session(domains );

select count(*)
from session
where 'www.foo%' <~~ ANY(domains);

Finalize Aggregate  (cost=331523.11..331523.12 rows=1 width=8)
  ->  Gather  (cost=331522.90..331523.11 rows=2 width=8)
        Workers Planned: 2
        ->  Partial Aggregate  (cost=330522.90..330522.91 rows=1 width=8)
              ->  Parallel Index Only Scan using idx_domains2 on session  (cost=0.42..330200.52 rows=128952 width=0)
                    Filter: ('www.foo%'::text <~~ ANY ((domains)::text[]))

OTHER TIPS

You won't be able to index an expression like this at all:

<constant> <operator> ANY(<array column>)

Your only chance would be to define an operator such that your expression looks like:

<array column> <operator> <constant>

But writing a GIN operator class means writing an extension in C, and I don't think you want to go that far.

The easy solution would be to change your data model so that you don't use arrays for things like that.

For trigram support, you can try the parray_gin extension

WHERE domains @@> ARRAY['www.foo%'];

If you just want to do prefix matching (more efficiently than that provided by trigram), I don't think there is any way you can do that without writing some C code to glue the pieces together. I think you would then work on the array type directly, so wouldn't need the ANY, and so wouldn't benefit from the reverse_like operator at all.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top