Question

I have a function in my Postgres database where the calculation is based on how old a record is - so given the same parameters, it always outputs a different value.

However, I'd like to index or cache the results for about an hour or two, because accuracy is much less important than read speed and load efficiency. Is this possible to do in Postgres, or do I need to use an external system to help? For example using cron to calculate a temporary table on a time interval which I access rather than using the function directly in queries. This is Postgres 11.

The basic algorithm is a measure of hotness, which is essentially a measure of popularity (measured by votes), but decaying over time (like Reddit posts).

The function code is below. It is a bit verbose; for the purpose of the question it probably could just be votes - ageInDays.

-- reddit's hotness algorithm, allegedly
-- https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9
-- def hot(ups, downs, date):
--     s = score(ups, downs) // This is ups - downs for reddit, but we don't have downvotes
--     order = log(max(abs(s), 1), 10)
--     sign = 1 if s > 0 else -1 if s < 0 else 0
--     seconds = epoch_seconds(date) - 1134028003
--     return round(sign * order + seconds / 45000, 7)
create function epochSeconds(timestamp) returns int as $$
  select extract(epoch from $1)::int;
$$ language sql stable;

-- we do log2(n+1) instead of log10(n) since we get fewer votes than reddit
create function hotnessOrder(votes integer) returns numeric as $$
  select log(2, greatest(votes + 1, 1));
$$ language sql stable;

create function hotnessSign(votes integer) returns integer as $$
  select(case when votes > 0 then 1 else 0 end);
$$ language sql stable;

create function hotnessSeconds(creation timestamp) returns integer as $$
  select(epochSeconds(creation) - 1134028003);
$$ language sql stable;

-- We inflate hotnessSeconds divisor by a lot. Reddit's is 45000. We want good entities
-- to stay on the front page for several days
create function hotness(votes integer, creation timestamp) returns double precision as $$
  select round(
    ((hotnessSign(votes) * hotnessOrder(votes))
      + (hotnessSeconds(creation) / 5 / 45000))::numeric
  , 7)::double precision;

$$ language sql stable;

create function entity_hotness(entityId integer) returns double precision as $$

  select hotness(entity_votes(entityId), created)
  from entity
  where entity.id = entityId;

$$ language sql stable;

I'd much rather have a simple solution in Postgres that does not involve external dependencies as this project is intended to be as simple as possible. But this question is just to ask if it's possible to do without external tooling.

Était-ce utile?

La solution

You could use materialized views, but you would still need some external tool (cron, etc.) to refresh them on a regular basis. You could install some PostgreSQL-based scheduling system, but that would also count as an external tool as they don't ship with PostgreSQL. (And would probably rely on cron internally anyway).

You could have the querent (an already existing "external tool") itself do the refresh. It is a bit tricky for it to know when to, though. You would have to store the last refresh time someplace, then trigger a refresh if that is too old, then update that stored time. If it needs to refresh before returning a result, then it will kind of suck for the unlucky user to whom that task falls.

Licencié sous: CC-BY-SA avec attribution
Non affilié à dba.stackexchange
scroll top