Question

I've got a PostgreSQL table called queries_query, which has many columns.

Two of these columns, created and user_sid, are frequently used together in SQL queries by my application to determine how many queries a given user has done over the past 30 days. It is very, very rare that I query these stats for any time older than the most recent 30 days.

Here is my question:

I've currently created my multi-column index on these two columns by running:

CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)

But I'd like to further restrict the index to only care about those queries in which the created date is within the past 30 days. I've tried doing the following:

CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
WHERE created >= NOW() - '30 days'::INTERVAL`

But this throws an exception stating that my function must be immutable.

I'd love to get this working so that I can optimize my index, and cut back on the resources Postgres needs to do these repeated queries.

Was it helpful?

Solution

You get an exception using now() because the function is not IMMUTABLE (obviously) and, quoting the manual:

All functions and operators used in an index definition must be "immutable" ...

I see two ways to utilize a (much more efficient) partial index:

1. Partial index with condition using constant date:

CREATE INDEX queries_recent_idx ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp;

Assuming created is actually defined as timestamp. It wouldn't work to provide a timestamp constant for a timestamptz column (timestamp with time zone). The cast from timestamp to timestamptz (or vice versa) depends on the current time zone setting and is not immutable. Use a constant of matching data type. Understand the basics of timestamps with / without time zone:

Drop and recreate that index at hours with low traffic, maybe with a cron job on a daily or weekly basis (or whatever is good enough for you). Creating an index is pretty fast, especially a partial index that is comparatively small. This solution also doesn't need to add anything to the table.

Assuming no concurrent access to the table, automatic index recreation could be done with a function like this:

CREATE OR REPLACE FUNCTION f_index_recreate()
  RETURNS void
  LANGUAGE plpgsql AS
$func$
BEGIN
   DROP INDEX IF EXISTS queries_recent_idx;
   EXECUTE format('
      CREATE INDEX queries_recent_idx
      ON queries_query (user_sid, created)
      WHERE created > %L::timestamp'
    , LOCALTIMESTAMP - interval '30 days');  -- timestamp constant
--  , now() - interval '30 days');           -- alternative for timestamptz
END
$func$;

Call:

SELECT f_index_recreate();

now() (like you had) is the equivalent of CURRENT_TIMESTAMP and returns timestamptz. Cast to timestamp with now()::timestamp or use LOCALTIMESTAMP instead.

db<>fiddle here
Old sqlfiddle


If you have to deal with concurrent access to the table, use DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY. But you can't wrap these commands into a function because, per documentation:

... a regular CREATE INDEX command can be performed within a transaction block, but CREATE INDEX CONCURRENTLY cannot.

So, with two separate transactions:

CREATE INDEX CONCURRENTLY queries_recent_idx2 ON queries_query (user_sid, created)
WHERE  created > '2013-01-07 00:00'::timestamp;  -- your new condition

Then:

DROP INDEX CONCURRENTLY IF EXISTS queries_recent_idx;

Optionally, rename to old name:

ALTER INDEX queries_recent_idx2 RENAME TO queries_recent_idx;

2. Partial index with condition on "archived" tag

Add an archived tag to your table:

ALTER queries_query ADD COLUMN archived boolean NOT NULL DEFAULT FALSE;

UPDATE the column at intervals of your choosing to "retire" older rows and create an index like:

CREATE INDEX some_index_name ON queries_query (user_sid, created)
WHERE NOT archived;

Add a matching condition to your queries (even if it seems redundant) to allow it to use the index. Check with EXPLAIN ANALYZE whether the query planner catches on - it should be able to use the index for queries on an newer date. But it won't understand more complex conditions not matching exactly.

You don't have to drop and recreate the index, but the UPDATE on the table may be more expensive than index recreation and the table gets slightly bigger.

I would go with the first option (index recreation). In fact, I am using this solution in several databases. The second incurs more costly updates.

Both solutions retain their usefulness over time, performance slowly deteriorates as more outdated rows are included in the index.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top