Projects version and database compatibility

https://softwareengineering.stackexchange.com/questions/412323

13-03-2021
|

Question

We are trying to fix our's company versioning of several modules and I'm not sure how to consider Database compatibility when versioning.

The same Database is used for a lot of applications and we're thinking of using something like SemVer for both, applications and database.

There are versions when maybe nothing new comes and just a few fixes to database (so, probably no breaks here). The same happens for our WebProject, for example and so on.

But also can happens that one of ours applications has something new and breaks with database compatibility. Do we need to control the compatibility in both versions?

Nowadays, we have our full-product version and inside this fullproduct there's each module version.

For example:

Full-product version: 8.0
DB: 31.2.0
Web: 5.4.3
XYZ: 2.0.1
ABC: 7.0.5
etc...

Full-product version: 8.1
DB:  31.3.0 <--- break compatibility with Web
Web: 5.5.0 <--- break compatibility with database
XYZ: 2.0.1 <--- no changes since last version
ABC: 7.0.8 <---- Only fixes, no break compatibility with any other module
etc...

So, for example, we are developping the version 9.0 and the Web project will use a new column in a table. So, there's a change in database and web project. The SemVer compatibility should chance in both modules the "middle" number?

Version 9.0
DB:  31.4.0
Web: 5.6.0

One last thing is: During the test, we see that for fix one issue we would have a break compatibility . Our RC version should be updated? Example:

Web: 5.6.1 RC1
Web: 5.6.1 RC2
Web: 5.6.1 RCX? <- this fix create a compatibility break. Should it be 5.7.0 RC3?

Should database be versioned just as common module or there's another way of doing this and control compatibility between modules?

Solution

Is SemVer applicable?

Software using Semantic Versioning MUST declare a public API. This API could be declared in the code itself or exist strictly in documentation. However it is done, it SHOULD be precise and comprehensive.

Can the database be subject to SemVer ?

The DB engine is clearly software with an API. Your DBMS probably already uses SemVer (examples here), and you probably already define compatibility requirements in your release notes.
Stored database code like triggers or stored functions is another layer of software that is intirectly (triggers) or directly (stored functions) exposed as API by the database engine.
The data stored in the database is not software. But the data definition defines de facto an API to access the data, with the tables and the columns (in an RDBMS example) that can be used in queries. In this reagard the database definition is semversionable.

How to apply it?

Once your DB released, the following applies:

Once a versioned package has been released, the contents of that version MUST NOT be modified. Any modifications MUST be released as a new version.

Every change to the database definition, i.e. any DDL command or script requires a new version, must result in a new version that shall follow the semver rules. For example:

adding a new table causes in principle no incompatibilities, so it would be a minor version.
changing some table could be backwards compatible. However very quicky, DB changes might be backwards incompatible. Even a harmless increase of length of a field might cause some code that uses hardcoded length not to work anymore. So you might often change major version. This is why DBA's don't like changing the productive database.
some database practice might reduce such effect. For example, you could expose data via views. You may then change tables, add new columns, but make sure that the views are backwards compatible.

Some conventions might also reduce backward compatibility since you may define your API "strictly in documentation" (see Semver 1). You may therefore impose backwards compatibility requirements. For example you can impose not to assume length of columns to be fixed, or to always explicitely list the selected columns in a query, etc... I'd strongly recommend such documentation constraints to be considered part of the API.

How does it influence the other software components?

Your Web software for example may be semversioned based on the API it defines. GUI software does in principle not define an API, but often SemVer is applied by analogy.

The important point is that the versioning is based on the API provided, and not the API consumed. So if your Web software decides to query the database differently, this should not affect the software's major version since its own API remains backward compatible.

The things a more subtle for constraints and invariants:

Suppose you assume that a field should never be null, and suddenly one of the application writing in that table no longer respects this implicit constraint. This should not happen, because a minor change in one application might cause other applications to break despite the database scheme was not modified:

This is a strong argument in favor of declaring constraints within the database instead of assuming the applications behave well. Adopting such a strategy would cause your application to fail, and would require you to issue a new major database version if you want to allow weakened the constraint (but this time in a controlled manner).
But if you do not want to, or if the constraint cannot be expressed in your DB engine, you could still define such mandatory constraints and invariants in the database documentation. Weakening the constraint in one application would make it non compliant, unless you weaken the constraint in the database documentation, which would require a new database version, which would allow to better control the change propagation (or at least to be aware of it). But this is soft versioning.

Are there better alternatives?

Soft-versioning based on documentation is extremely risky. Database versionning might require frequent major changes, which require an impact assessment on all the applications depending on the database version.

A first strategy, is to version not the full database definition but schemas or namespaces in the database. This facilitates dependency management, since the granularity of change it tighter.

A second strategy is, as Robert described in his comments: to add database access layer, and manage the version for its components. The advantage is that it adds a lot of flexibility and allows a better decoupling between the data usage and the database. Nevertheless, it only shifts the problem, since you'd still need to know which version of the database access layer is compatible with what database definition. SO ieven if this approach has advantages, it shouldn't dissuade you from versioning the database :-)

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange