How to optimize query for text array in PostgreSQL using `@>` operator

https://dba.stackexchange.com/questions/250923

15-02-2021
|

Question

I have a column of type text[] and I want to search over this column using this SQL operator @>

Note: @> operator, it filters the data as such ['a','b'] in ['a','b','c']

The returned objects will be those where the values passed are a subset of the data

The problem though is that integer comparison is better than string comparison.
I'm thinking of another way maybe if Postgres hashes the values and only then it compares.

Note, I cannot make use of indexes because there is not only one column, moreover, the query will first filter on some id and then it needs to filter those multi-valued columns.

My question, Is there is some feature in Postgresql that supports comparing integers instead of strings.

Solution

If the strings are from a restricted set, you can define an ENUM datatype. This translates the strings to integers behind the scenes.

create type alph as enum ( 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z');

create table j as select floor(random()*100)::int, array_agg(substring('abcdefghijklmnopqrstuvwxyz',floor(random()*26)::int+1,1)) from generate_series(1,10000000) f(x) group by x%1000000;
create table j2 as select floor, array_agg::alph[] from j;

I get about a 2 fold speed improvement by doing:

select * from j2 where  array_agg @> '{a,b}';

rather than

select * from j where  array_agg @> '{a,b}';

If I include the condition and floor=7 (after creating an index on "floor"), then both queries are so fast that any difference in speed can not be reliably detected.

This seems like the essence of premature optimization to me.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange