Pergunta

I have a large set of denormalized data with uneven attributes (some attributes are there, some are not) and insert it into a single column. This column contains around 300 key/value pairs with a total size of 5000 chars per row. I wanna do string search queries on some of these attributes with ilike and OR operators on a total of 100000 rows.

The query:

SELECT hstore->'a' AS a, hstore->'b' AS b,hstore->'c' AS c
  FROM table
  WHERE
       hstore->'x' ILIKE '123%' 
    or hstore->'y' ILIKE '123%'
    or hstore->'z' ILIKE '123%'

With this query on the unindexed table I get over >500ms runtime (explain analyze).

With my old RDBMS indexed table where every attribute is in a single column I achieve a much better performance, though less flexible.

I tried different/multiple indexes on those hstore attributes, like

CREATE INDEX idx_table_hstore ON table( (hstore->'a') )

and one index for each, but the performance is the same as having no index at all.

As far as I understand GIN/GIST indexes wouldn't make much sense since the column is pretty large and doesn't need geometric operators (I may be wrong on that one).

What indexing method would you use in such a situation to achieve similar or even better performance than using a classic model ?

Foi útil?

Solução

This depends a lot on your specific use case, which isn't entirely clear. In your sample query you're testing the values of keys x, y, and z. If those three keys (or some relatively small subset of all your keys) are the only ones used for lookups, you might consider moving them to their own columns - then your lookup fields are fixed but you still have the flexibility of the hstore column.

It's also not clear if you've created indexes on each individual key or just the lookup columns. If you did one of those on every key you're talking about around 300 indexes (you mention having about 300 keys), and then you're also giving up some of the flexibility of the hstore (by having to create one of those indexes for every single key). I'd stick to just the lookup columns (x, y, z) here and tweak them a bit to look like this:

create index idx_t_h_x on t ((lower(h->'x')));

The index you mentioned doesn't support the ilike operator, so you'll need to index on lower (or upper) of your values then modify your predicate to match, like so:

SELECT hstore->'a' AS a, hstore->'b' AS b,hstore->'c' AS c
FROM table
WHERE lower(hstore->'x') LIKE '123%'

Also, gin/gist indexes aren't only for geometric operations (in fact the "g" in both names is "generalized" - they're intended to be multi-purpose). If you check out the docs for the hstore module you'll see which operators are supported by either a gist or gin index on a hstore column*. One of those is "?", which tests if a key is present. Depending on the sparsity of your lookup keys (x, y, z), you might have some luck by defining a gist or gin index on the column and adding an extra condition like "where (hstore ? 'x' and hstore->'x' ilike '123%')"; assuming not many rows have the key x this should give you a decent boost, otherwise if the key x is in nearly every row you'll be back to full table scans.

When it comes to deciding whether to use gist or gin, if you check around the postgres docs and here on SO you'll find some guidelines, basically that gin tends to be faster to lookup but takes more space and is slower to build and maintain (meaning keep in mind whether you're writing or reading data more) - I'm not sure if there are specific recommendations for the hstore type.

Oh, and, obviously this all assumes your server is configured appropriately for your hardware and usage. As I pointed out, the index you provided doesn't support the ilike operator, so that'll never be used. Once you get an index that you think should be used, you might try disabling table scans (check the config for enable_seqscan) to see if you can figure out why the planner isn't using it. If your config is out of the box you might have random_page_cost set high, you might be doing lots of on disk sorts if your work_mem isn't high enough, etc.

*Just to point out a theme here, not all index types support all operators.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top