Get Day, Month, Year, Lifetime total records with one query w/ optimizations

https://stackoverflow.com/questions/6127729

07-01-2021
|

Frage

I have a Postgres DB running 7.4 (Yeah we're in the midst of upgrading)

I have four separate queries to get the Daily, Monthly, Yearly and Lifetime record counts

SELECT COUNT(field)
FROM database
WHERE date_field
    BETWEEN DATE_TRUNC('DAY' LOCALTIMESTAMP) 
    AND DATE_TRUNC('DAY' LOCALTIMESTAMP) + INTERVAL '1 DAY'

For Month just replace the word DAY with MONTH in the query and so on for each time duration.

Looking for ideas on how to get all the desired results with one query and any optimizations one would recommend.

Thanks in advance!

NOTE: date_field is timestamp without time zone

UPDATE:

Sorry I do filter out records with additional query constraints, just wanted to give the gist of the date_field comparisons. Sorry for any confusion

Lösung

I have some idea of using prepared statements and simple statistics (record_count_t) table for that:

-- DROP TABLE IF EXISTS record_count_t;
-- DEALLOCATE record_count;
-- DROP FUNCTION updateRecordCounts();

CREATE TABLE record_count_t (type char, count bigint);
INSERT INTO record_count_t (type) VALUES ('d'), ('m'), ('y'), ('l');

PREPARE record_count (text) AS
UPDATE record_count_t SET count =
(SELECT COUNT(field)
FROM database
WHERE
CASE WHEN $1 <> 'l' THEN
    DATE_TRUNC($1, date_field) = DATE_TRUNC($1, LOCALTIMESTAMP)
ELSE TRUE END)
WHERE type = $1;

CREATE FUNCTION updateRecordCounts() RETURNS void AS
$$
    EXECUTE record_count('d');
    EXECUTE record_count('m');
    EXECUTE record_count('y');
    EXECUTE record_count('l');
$$
LANGUAGE SQL;

SELECT updateRecordCounts();
SELECT type,count FROM record_count_t;

Use updateRecordCounts() function any time you need update statistics.

Andere Tipps

I'd guess that it is not possible to optimize this any further than it already is.

If you're collecting daily/monthly/yearly stats, as I'm assuming you are doing, one option (after upgrading, of course) is a with statement and the relevant joins, e.g.:

with daily_stats as (
(what you posted)
),
monthly_stats as (
(what you posted monthly)
),
etc.
select daily_stats.stats,
       monthly_stats.stats,
       etc.
stats
left join yearly_stats on ...
left join monthly_stats on ...
left join daily_stats on ...

However, that will actually perform less well than running each query separately in a production environment, because you'll introduce left joins in the DB which could be done just as well in the middleware (i.e. show daily, then monthly, then yearly and finally lifetime stats). (If not better, since you'll be avoiding full table scans.)

By keeping things as if, you'll save the precious DB resources to deal with reads and writes on actual data. The tradeoff (less network traffic between your database and your app) is almost certainly not worth it.

Yikes! Don't do this!!! Not because you can't do what you're asking, but because you probably shouldn't be doing what you're asking in this manner. I'm guessing the reason you've got date_field in your example is because you've got a date_field attached to a user or some other meta-data.

Think about it: you are asking PostgreSQL to scan 100% of the records relevant to a given user. Unless this is a one-time operation, you almost assuredly do not want to do this. If this is a one-time operation and you are planning on caching this value as a meta-data, then who cares about the optimizations? Space is cheap and will save you heaps of execution time down the road.

You should add 4x per-user (or whatever it is) meta-data fields that help sum up the data. You have two options, I'll let you figure out how to use this so that you keep historical counts, but here's the easy version:

CREATE TABLE user_counts_only_keep_current (
  user_id , -- Your user_id
  lifetime INT DEFAULT 0,
  yearly INT DEFAULT 0,
  monthly INT DEFAULT 0,
  daily INT DEFAULT 0,
  last_update_utc TIMESTAMP WITH  TIME ZONE,
  FOREIGN KEY(user_id) REFERENCES "user"(id)
);
CREATE UNIQUE INDEX this_tbl_user_id_udx ON user_counts_only_keep_current(user_id);

Setup some stored procedures that zero out individual columns if last_update_utc doesn't match the current day according to NOW(). You can get creative from here, but incrementing records like this is going to be the way to go.

Handling of time series data in any relational database requires special handling and maintenance. Look in to PostgreSQL's table inheritance if you want good temporal data management.... but really, don't do whatever it is you're about to do to your application because it's almost certainly going to result in bad things(tm).

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow