Question

SHORT STORY: I have a partitioned postgres database with a table to track the partitions and triggers. The triggers need to alter the constraints on the partition tables (their valid_date [daterange] changes depending on the other partition's valid_date) and preferably be able to delete tables, but this is causing errors because the tables are in use by the trigger chain'.

cannot ALTER TABLE "core_geometryrecord_8_2" because it is 
    being used by active queries in this session

FULL QUESTION: I am giving a thorough explanation of my schema and trigger architecture in case it is needed or someone wants to know why I feel the need for a 'convoluted' system, though it may not be necessary.

I am refactoring a large database of geometry data to make it easier and faster to use. The old systems has 'geometry tables' that hold related geometries (I.E. Counties of the USA). The data should really be grouped by "date_valid" and the "geometrytable_id".

We decided to use postgres partitioning with constraints on the date_valid (a date range) and the geometrytable_id (foreign key).

Because the dates are sensitive and take a lot of book keeping to keep correct I tried my first hand at managing a large part of a DB with triggers (I already knew I needed an insert trigger for the partitioned table so why not).

I designed a set of triggers that manages the list of partitions and does all the book keeping when you insert or remove a geometry. Here is what they do.

  • On insert into partition list table, create the partition table and apply constraints. If there is a partitionvalid for a date overlapping this new one (Call Update Trigger), trim it. If there is a partition valid for a date after this one, trim new partition end date to start date of next partition (keep a continuous timeline).
  • On delete from partition list table, delete the partition table. Recalculate the date ranges of the partition_list entries (Call Update Trigger) around the removed partition (if any) to make the dates continuous.
  • On Update partition list table, DROP OLD DATE CONSTRAINT, change date_valid on all the rows in the partition table to the new date_valid value of the partition, then CREATE DATE CONSTRAINT with new date (constraint is that the items have the exact same date_valid as partition).
  • On insert into geometryrecord (the partitioned table), find what partition should be used, or INSERT into partition list table (TRIGGERS ON INSERT). Insert NEW into the correct partition.
  • On Delete from geometryrecord, if partition table is now empty, delete the partition entry from partition list table (CALLS DELETE TRIGGER).

It keeps getting angry at me for trying to change the table in any way if I have touched it at all in any of the triggers in the chain. The partition list table's triggers work perfectly if I directly insert into that table, but deleting from a geometry table (which calls the same trigger) flips out even though all it is doing is a select (to see if the partition table is empty). Inserting into the partitioned geometry table can also cause issues because I have to remove the constraint in order to change the date_valid of the rows.

There has to be a way to do this that I just don't understand. (I also had to make the deleting of the table a more passive 'mark it for deletion by a cron job' because I can't delete the table from a trigger call originating from the table I want to delete.)

Any advice would be greatly welcome. I just can't believe that no one has ever needed to do something like this, so I assume I just don't know what I am doing :).

Was it helpful?

Solution

So after digging around and experimenting I figured out what I needed and thought I would document my discovery.

Here is the docs page on how basic postgres partitioning works for anyone interested: http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

The key point to know is that for every partition, you have to set constraints that specify what can be found in the partition (splitting the id so only 200,000 records are in each table would simply require one constraint per table). This makes it so when you query the master table, it quickly checks the request against the constraints of all the children tables. Only tables for which the query falls in the range of the table's constraints are further queried. If you do this right, only one table is truly queried.

So as for what I learned:

First off, I don't think it is possible at all to drop a table from its own trigger. My solution for this was just to mark it to drop through some other mechanism (like a table of things to drop later). The issue with this is it stays around until the Cron job runs. Since altering a table from its own trigger is also impossible, the table will still be a partition, just marked to be dropped later. If you have your partitioning correct this may cause no problems. In my case there can be two partitions overlapping (one that is marked to be dropped, and one that should be there). The side effect of this is that for queries specifying things that could be in either of these tables will hit both tables. This is fine for me until the Cron job comes by so I am good, but some cases might not work with this issue.

Secondly, I realized that inserts that were altering tables they didn't write to were working fine, but deletes and updates that triggered the same trigger chain were failing because the tables were in use by the session. I assumed this was something crazy, but what the real issue was is that I was partitioned on a date and a foreign key, but was deleting by the id of the record. This caused postgres to check all the constraints of all of the tables, which gave it no insight on which table to use so it just checked all of them. All I had to do to get the delete to work was specify the info I partitioned on so it knew what table to look in instead of scanning the whole set of tables looking for an id.

TLDR If you partition a postgres table on some arbitrary (set of) column(s), make sure you delete or update the records by providing the columns you partitioned on so postgres can know what table to look in and not have to scan the whole set.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top