Question

I know that triggers can be used to validate stored data to keep database consistent. However, why not perform validation of data on the application side before storing them into the database?

For example, we store clients, and we want to perform some validation that cannot be easily done on DDL level. https://severalnines.com/blog/postgresql-triggers-and-stored-function-basics

Another example is audit.

Update

How triggers and database transaction work together. For example, if I would like to perform validation of data being inserted. It is done inside of a transaction. What happens earlier: transaction is committed or trigger is executed?

Was it helpful?

Solution

It depends on what kind of application system you are building:

  • if you are creating an application-centric system which contains just one main application, with a dedicated database specifically for this application, and ideally one team responsible for evolving application and database side-by-side, you can keep all validation logic and also audit logic inside the application.

    The main benefit of this is that you do not have to distribute the business logic between application and db, so maintaining and evolving the system becomes often easier. As a bonus, you do not tie the application too much to a specific type of DBMS or DBMS vendor. This approach is obviously required if your application wants to be able to use a lightweight DB system which does not provide triggers.

  • If, however, you create a system where many different applications share a common database, and it cannot envisioned beforehand which applications will write to it in the future, or which teams will develop applications for filling data into the db in the future, then it is better your database will be responsible for guaranteeing as much data consistency as it can. And that is where triggers get really helpful. In larger systems, referential constraints are often not sufficient for this, but a trigger which calls a stored procedure can implement almost any kind of validation you need.

Another reason for using triggers can be performance: in complex data models, it is not uncommon to encounter complex consistency rules which require to use a lot of additional data which is not part of the current working set available at the client application. Transfering all those data over the network first for making validation possible at the client side can have a notable performance impact.

See also this older SE post: Application Logic Vs DB Triggers for database cleaning

So decide for yourself what kind of system you are building, then you can make a founded decision if triggers are the right tool for your case or not.

OTHER TIPS

I think the question is about responsibility for quality of data.

The answer depends on how you see the system.

If you see the database as an independent, distinct, and autonomous service separate from the application, then the database is responsible for ensuring the consistency and quality of the data it contains. Essentially because that database could be used by a different application, so it cannot rely on that second application having the same consistency and quality behaviours. In these circumstances the database needs to be designed to expose an API and autonomous behaviour. In this view there are at least two applications, one of them is the database and the other is the application using it.

Conversely the database could be considered a complicated form of file that is under the direct and total control of the application. In this sense the database devolves to being a pure serialisation and document navigation tool. It may provide some advanced behaviours to support query, and document maintenance (like JSON, or XML tools do) but then again it does not have to (like most file streams do). In this case it is purely the programs responsibility to maintain the correct format and content within the file. In this view there is one application.

In both views the next question is how to support the usage of the database as either a fancy file, or a separate service. You could achieve this by:

  • using the tools that the database platform provides in the form of tables/views/stored procedures/triggers/etc...
  • wrapping the database itself within a service that all clients must use in order to access the database
  • wrapping the database in a library which must be used by all clients in order to access the data.

Each comes with its own pros/cons and will depend upon the architectural constraints of the environment the system operates within.

Regardless of which view you take it always pays to validate data at boundaries.

  • Validate the fields on a UI that a user enters
  • Validate the network/API request before it leaves the client
  • Validate the network/API Request in the server before doing anything
  • Validate the data being passed into business rules
  • Validate the data before being persisted
  • Validate the data after being retrieved from persistence
  • so on and so on

How much validation is warranted at each boundary depends upon how risky it is to not validate it.

  • multiplying two numbers together?
    • you get the wrong number is that a problem?
  • invoking a procedure on a given memory location?
    • What is in that memory location?
    • What happens if the object does not exist, or is in a bad state?
  • using a regex on a string containing kanji?
    • Can the regex module handle unicode?
    • Can the regex handle unicode?

No, you should never use triggers to do validation.

The database is only responsible for its own integrity. Any user facing validation should be performed by your application.

Databases perform three levels of validation for integrity. The first one is field level validation. A field can be required, if there is no value (null) it is an error. It can also be a check constraint; a domain has an enumerated number of values.

Secondly there are relations between tables. In one table you store one or more foreign keys, relating this table to other tables and requiring the values to be valid keys for the "other table". Think of a address database, where we support addresses of different countries. A country key in an address must point to a known country. Whether the data (e.g. a postal code) is valid, is not a concern of this integrity check.

Thirdly and most complicated are triggers. As a general rule these should address (pun not intended) concerns integrity rules that are conditional. To come back to the address example: if a country does not have postal codes, it would be a problem if a country in this list would have a postal code. So the check would be: if this country does not have postal codes, the postal code field should be null.

Validation is the concern of the application. The fact that a German postal code consists of only digits is a check the application should make, not the database. The line is a thin one, so you may need some thinking/discussing in some cases if something should be in a trigger (protect integrity of your database) or in the application (user facing validation).

Audit is a classic example of the use of triggers effectively. I've found some errors made by the tester (moving a client from one level of service to another) thanks to an Audit table which was implemented by triggers. I highly recommend using triggers for audit.

Validation could be done in the front end level, but I've seen weird errors in database that I've handled (people who were born in 3000, etc), and since some of them I made myself I highly recommend having an extra layer of validation in the database, just in case. Of course, those types of errors could be avoided with check constraints, and many times they are more effective (in MS SQL they are the preferred method; always check the documentation).

Because the question is about if we really need triggers for relational databases here are some other use cases where to use triggers:

  1. For auditing as described in the other answers.
  2. Auditing in the wider sense: if a database entry is changed a trigger can record the event for asychroneous post processing, e.g. nightly exports to another application.
  3. Triggers for views: triggers can be defined instead of. With this mean one can insert, update and delete entries from a view. The triggers can spread these actions onto multiple tables. This is a way to make a restricted view available without exposing the details of the underlying tables.
  4. To explicitly save database turn arounds: assume a N:M relations between table A and B and a intermediate table R. You can define foreign key constraints from R to A as well as B specifying that the entry in R is to be dropped if its corresponding entry in B is deleted. However, the business logic requires sometimes that entries in A must have at least one relation to an entry in B. In this case a trigger on deletion of R can help to enforce this logic: if for an entry in A the last entry in R is deleted then the trigger can delete A. In the application centric view at least two turn arounds are necessary. This is an example for validation. Other examples are conceivable: beside use cases (1), (2), and (3) where the triggers save turn arounds too assume the case where a user is inserted into a user table and a set of defaults in a preference table is to be generated.
  5. Trust: sometimes database admins change entries on the command line not using your application. Admins work carefully and know what they do. However, sometimes they might be wrong. If consistency is critical a trigger is their safety belt.

As a drawback business logic is distributed between the layers and this is a major disadvantage for maintenance. As another author wrote, it is a thin boundary between application and database and the choice is not always clear. My personal opinion is that triggers place a burden on developers. They may save time in development. Definitly they enhance user expirience because they boost performance over slow network connections.

Licensed under: CC-BY-SA with attribution
scroll top