How to optimize query for text array in PostgreSQL using `@>` operator
-
15-02-2021 - |
Question
I have a column of type text[]
and I want to search over this column using this SQL operator @>
Note: @>
operator, it filters the data as such ['a','b'] in ['a','b','c']
The returned objects will be those where the values passed are a subset of the data
The problem though is that integer comparison is better than string comparison.
I'm thinking of another way maybe if Postgres hashes the values and only then it compares.
Note, I cannot make use of indexes because there is not only one column, moreover, the query will first filter on some id and then it needs to filter those multi-valued columns.
My question, Is there is some feature in Postgresql that supports comparing integers instead of strings.
Solution
If the strings are from a restricted set, you can define an ENUM datatype. This translates the strings to integers behind the scenes.
create type alph as enum ( 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z');
create table j as select floor(random()*100)::int, array_agg(substring('abcdefghijklmnopqrstuvwxyz',floor(random()*26)::int+1,1)) from generate_series(1,10000000) f(x) group by x%1000000;
create table j2 as select floor, array_agg::alph[] from j;
I get about a 2 fold speed improvement by doing:
select * from j2 where array_agg @> '{a,b}';
rather than
select * from j where array_agg @> '{a,b}';
If I include the condition and floor=7
(after creating an index on "floor"), then both queries are so fast that any difference in speed can not be reliably detected.
This seems like the essence of premature optimization to me.