Вопрос

I have a Django website with a postgresql 9.3.10 backend. The website contains dynamic user-generated content (think of it as a form of 9gag). There are some tables from which I routinely delete aged rows. The database is called dbname and I log into it via the user postgres. The following are two examples of such routine deletion:

begin;
DELETE FROM links_groupseen WHERE which_reply_id IN (SELECT id FROM links_reply where "submitted_on" < now() - interval '7 days');
DELETE FROM links_reply WHERE "submitted_on" < now() - interval '7 days';
commit;

begin;
DELETE FROM links_vote WHERE link_id IN (SELECT id FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour');
DELETE FROM links_photoobjectsubscription WHERE which_link_id IN (SELECT id FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour');
DELETE FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour';
commit;

There are ~10 other such operations I run. At the moment, I do everything manually - I want to automate these to run at a certain low-utlization time of day. How do I do that? Can someone give me an illustrative example?


As per my own rudimentary knowledge, I'm thinking I should first save the following shell script to /etc/cron.daily:

#!/bin/sh

dbname="dbname"
username="postgres"
psql $dbname $username << EOF
begin;
DELETE FROM links_groupseen WHERE which_reply_id IN (SELECT id FROM links_reply where "submitted_on" < now() - interval '7 days');
DELETE FROM links_reply WHERE "submitted_on" < now() - interval '7 days';
commit;
begin;
DELETE FROM links_vote WHERE link_id IN (SELECT id FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour');
DELETE FROM links_photoobjectsubscription WHERE which_link_id IN (SELECT id FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour');
DELETE FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour';
commit;
EOF

And then add the said script to crontab like so:

30 4 * * * /etc/cron.daily/mypgscript

And then service crond restart

Это было полезно?

Решение

Back in 2013 a similar question was asked (How to run recurring tasks on a Postgresql database without a cron-like tool?). The crontab or pgAgent were the only way to go.

For version 9.3 there is already the Background Worker Processes. But even in the version 9.5.2 documentation there is a warning about using it.

I think that the cron solution is a good way to go but check the access to the database. Limit the access of the user to one server only because otherwise anybody can use this user from any place to do the deletion and they might not delete only the older ones.

Другие советы

A method that I find useful is to implement the cleanup code in my main programming framework (so Django, I guess, in your case) and expose that function via a URL. Then in my crontab I use wget or curl to invoke the cleanup URL. That way the cron file is only responsible for the scheduling part, while the code that does the cleanup is kept together with the code that creates the data that needs to be cleaned up.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с dba.stackexchange
scroll top