Question

I have a PostgreSQL database table with text[] (array) columns defined on it. I'm using these columns to search for a specific record in the database in this way:

select obj from business
where ((('street' = ANY (address_line_1)
    and 'a_city' = ANY (city)
    and 'a_state' = ANY (state))
or    ('street' = ANY (address_line_1)
    and '1234' = ANY (zip_code)))
and ('a_business_name' = ANY (business_name)
    or 'a_website' = ANY (website_url)
    or array['123'] && phone_numbers))

The problem I'm having is that with about 1 million records, the query gets really slow. My question is simple, do array columns have different types of indexes?. Does anybody know the best type of index to create in this case? (Assuming there are different types).

Just in case, this is the explain analyze response:

"Seq Scan on business  (cost=0.00..207254.51 rows=1 width=32) (actual time=18850.462..18850.462 rows=0 loops=1)"
"  Filter: (('a'::text = ANY (address_line_1)) AND (('a'::text = ANY (business_name)) OR ('a'::text = ANY (website_url)) OR ('{123}'::text[] && phone_numbers)) AND ((('a'::text = ANY (city)) AND ('a'::text = ANY (state))) OR ('1234'::text = ANY (zip_code))))"
"  Rows Removed by Filter: 900506"
"Total runtime: 18850.523 ms"

Thanks in advance!

Was it helpful?

Solution

You can use a GIN index to effectively help performance with arrays.
Use it in combination with array operators.

For instance:

CREATE INDEX business_address_line_1_idx ON business USING GIN (address_line_1);

Do that for all array columns involved in conditions.

It might be worth considering to normalize your schema instead. Maybe splitting up the multiple entries into a separate (1:n or n:m) table would serve you better. It often does in the long run, even if it seems like more work at first.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top