Design rules to make database easier to maintain

https://dba.stackexchange.com/questions/160369

05-10-2020
|

Question

I'm designing a small system that includes an API and a Database. On the API side I'm familiar with versioning the functions (Endpoints) to allow older clients to continue working.

I'm looking for advice to achieve the same effect for the database. Is there a way to "version" the database tables (or schemas), so to speak, to allow older applications to continue working? In this case the that will be the older versions of the API service that I'm maintaining to support older clients.

One thought that occurred to me is "create a versioned view for every table", so for example if the table is called account I would have a view called account_v1.

I obviously don't have a problem with adding new tables or even adding columns to tables (as long as they have default values or allow Null)

The biggest problem comes when splitting a table into multiple tables or when dropping columns.

A simple example would be taking a user's email address out of the user table because in version 2 you want to allow the user to have multiple email addresses. I'm quite new to views and triggers but I think you can set it up so that the older application still sees the email address in the table, and writes the email address to the new separate table.

I'm finding it terribly inconvenient to maintain views for every table, but this may be because I'm new to it and it will get easier? Or are there perhaps some other rules or methods that makes it easier to upgrade and maintain older versions so that older version of the application doesn't break? Is there an easy way to generate the views and triggers automatically because the initial View would basically mirror the table at that time. Later when the table is modified, I would then have to hand-fix the view(s) associated with that table to maintain a consistent view. And it seems to me that View version N-2 should refer to View version N-1 as it's base, with View version N-1 referring to View version N. View version N refers initially to the raw table but when the table is changed that view will be re-written.

FWIW I'm using Postgres 9.5

Solution

It really is terribly inconvenient.

Some RDBMSes make it a bit easier with things like synonyms, more flexible changes to views, etc. But it's still a massive pain to do the sort of thing you're describing.

PostgreSQL doesn't let you do much to the underlying tables without dropping and re-creating the views and every object that depends on them, which makes them very inconvenient.

Your model of nested views will also quickly get a bit costly in query planner time. Plus you'll land up with a big search_path and lots of schemas quite quickly if you use a versioned schema approach. Or if you create name-versioned wrapper views for every DB object with every schema change you'll land up with huge numbers of database objects.

Not only that, but dealing with multiple values, upserts, concurrency and locking correctly when you're going via updatable views with rules can get hairy. Take your email address example: you have a row with an email address already set. A v1 client tries to clear it (set it to null); meanwhile a v2 client tries INSERT a second email address into the separate addresses table. What do you do? There are related problems with upserts and similar. It gets ugly fast.

Not to mention that you'll land up with kludges a lot, like having to aggregate lists of email addrs into a comma separated field, then de-aggregate them when UPDATEd, compare to the aggregated current table contents, and do a merge operation to add/delete email addresses. All while ensuring no concurrent operations occur on the same data. Doing this with triggers and rules is fun at all and very prone to deadlocks.

So personally ... I wouldn't bother. The consumer of the database is your API service. You can handle the change abstraction and versioning there to a large extent, and that's just what I suggest you do. You probably have a data access layer in the application that can help abstract some of it, but to a large extent it's about updating things so that the new code exposes the old APIs remapped onto the new backend, so you're not maintaining multiple versions of everything.

People who do try to maintain a consistent database "API" usually do so via stored procedures (or in PostgreSQL, functions). This has some ugly performance consequences, especially since in PostgreSQL non-inlined function results are fully materialised before being returned. It's basically the same approach as a web API anyway; expressing operations on the data as the interface rather than manipulating the data structure directly. If you have a web API there's no need to duplicate this at the DBMS level as well.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange