What happened to database constraints?

https://softwareengineering.stackexchange.com/questions/337081

02-01-2021
|

题

When I review database models for RDBMS, I'm usually surprised to find little to no constraints (aside PK/FK). For instance, percentage is often stored in a column of type int (while tinyint would be more appropriate) and there is no CHECK constraint to restrict the value to 0..100 range. Similarly on SE.SE, answers suggesting check constraints often receive comments suggesting that the database is the wrong place for constraints.

When I ask about the decision not to implement constraints, team members respond:

Either that they don't even know that such features exist in their favorite database. It is understandable from programmers using ORMs only, but much less from DBAs who claim to have 5+ years experience with a given RDBMS.
Or that they enforce such constraints at application level, and duplicating those rules in the database is not a good idea, violating SSOT.

More recently, I see more and more projects where even foreign keys aren't used. Similarly, I've seen a few comments here on SE.SE which show that the users don't care much about referential integrity, letting the application handle it.

When asking teams about the choice not to use FKs, they tell that:

It's PITA, for instance when one has to remove an element which is referenced in other tables.
NoSQL rocks, and there are no foreign keys there. Therefore, we don't need them in RDBMS.
It's not a big deal in terms of performance (the context is usually small intranet web applications working on small data sets, so indeed, even indexes wouldn't matter too much; nobody would care if a performance of a given query passes from 1.5 s. to 20 ms.)

When I look at the application itself, I systematically notice two patterns:

The application properly sanitizes data and checks it before sending it to the database. For instance, there is no way to store a value 102 as a percentage through the application.
The application assumes that all the data which comes from the database is perfectly valid. That is, if 102 comes as a percentage, either something, somewhere will crash, or it will simply be displayed as is to the user, leading to weird situations.
While more than 99% of the queries are done by a single application, over time, scripts start to appear—either scripts ran by hand when needed, or cron jobs. Some data operations are also performed by hand on the database itself. Both scripts and manual SQL queries have a high risk of introducing invalid values.

And here comes my question:

What are the reasons to model relational databases without check constraints and eventually even without foreign keys?

For what it's worth, this question and the answers I received (especially the interesting discussion with Thomas Kilian) led me to write an article with my conclusions on the subject of database constraints.

解决方案

It is important to distinguish between different use cases for databases.

The traditional business database is accessed by multiple independent applications and services and perhaps directly by authorized users. It is critical to have a well-thought out schema and constraints at the database level, so a bug or oversight in a single application does not corrupt the database. The database is business-critical which means inconsistent or corrupt data may have disastrous results for the business. The data will live forever while applications come and go. These are the places which may have a dedicated DBA to ensure the consistency and health of the database.

But there are also systems where the database is tightly integrated with a single application. Stand-alone applications or web application with a single embedded database. As long as the database is exclusively accessed by a single application, you could consider constraints redundant - as long as the application works correctly. These systems are often developed by programmers with a focus on application code and perhaps not a deep understanding of the relational model. If the application uses an ORM the constraints might be declared at the ORM level in a form more familiar to application programmers. In the low end we have PHP applications using MySQL, and for a long time MySQL did not support basic constraints at all, so you had to rely on the application layer to ensure consistency.

When developers from these different backgrounds meet you get a culture clash.

Into this mix we get the new wave of distributed "cloud storage" databases. It is very hard to keep a distributed database consistent without losing the performance benefit, so these databases often eschew consistency checks at the database level and basically lets the programmers handle it at the application level. Different application have different consistency requirements, and while Googles search engine prioritize availability over consistency across their servers, I'm willing to bet their payroll system runs on a relational database with lots of constraints.

其他提示

More and more systems nowadays are running in distributed environments, on the cloud and adopting the technique to "scale out", instead of "scale up". That's even more important if you're dealing with online internet-facing applications, such as e-commerce apps.

That being said, all applications that are supposed to scale are constrained by the CAP Theorem, where you have to chose 2 of 3: Consistency, Availability and Partition Tolerance (network fault tolerance).

By studying the CAP Theorem you'll see that there's not much choice, but to chose to lose either Availability or Consistency, since you can NEVER really trust in the Network 100% of the time.

In general, several applications can afford to be inconsistent for some reasonable amount of time, but cannot afford to be unavailable to the users. For example, a slightly unordered timeline in Facebook or Twitter is better than not having access to a timeline at all.

Thus, several applications are choosing to let go relational database constraints, since relational databases are really good at Consistency, but at the cost of availability.

Personal note: I'm old fashioned too, and I've been working with some really old financial systems where data consistency is a first-class requirement most of the time, and I'm a big fan of database constraints. The database constraints are the last line of defense against years and years of bad development and teams of developers that come and go.

"Est modus in rebus". Let's keep using DB "low level" consistency where consistency is a first class requirement. But sometimes, letting it go is not a big sin after all.

-- EDIT: --

Since there's a small edit in the question, there's another legitimate reason to drop constraints in the database, IMO. If you design a product from scratch, where you design your system to support multi-database technology, you may settle for the least common denominator among the supported databases, and eventually drop the use of any constraints at all, leaving all the control logic for your application.

Although it's legitimate, it's also a gray area to me, because I just can't find any database engine today that doesn't support simple constraints like the one proposed in the original question.

What are the reasons to model relational databases without check constraints and eventually even without foreign keys?

First let's get clear that I'm talking here only about RDBMs, not about no-SQL databases.

I've seen a few databases with no FK or PK, let alone check constraints but to be honest they are a minority. Perhaps because I work in a big company.

In my experience through the years I can say that some reasons may be:

In the case of beginners or hobby programmers, a lack of modeling skills
Extensive or almost exclusive use of ORMs with no real contact with the database world
Absence of a DBA or other data modeling expert on a team or small project
Lack of involvement of the DBA or data modeling expert in the first stages of the development
Deliberate design decisions by a part of the developer community that considers that even a check constraint that enforces that a certain column can only have 1,2 or 3 as a value, or that the "age" column must be >= 0 is "having business logic in the database". Even default clauses are considered by some as business logic that don't belong to a database, as you can see in several recent questions and answers in this very site. This developers that so consider, obviously would use as few constraints as possible and will do everything in code, even referencial integrity and/or unicity. I think this is an extreme position.
Use of RDBMs as key-value storages, either for emulating no-SQL behavior of because the requierements where simple enough to be satisfied by using RDBMS tables as isolates key-value repositories.
Assuming that the database will always be written to by "the app" and that nobody will ever need to do a massive data load, or edit or insert rows via a SQL client (in many cases to correct bad data the app inserted). In the best case escenario there will always be another app (besides "the app") issuing DML instructions to the database: a SQL client.
Not realizing that the data belongs to the business owner, not to the app.

That said, I would like to state that RDBMS are very advanced pieces of software that has been built over the shoulders of giants and have proved very efficient for a lot of business requirements, liberating the programmers of mundane taskes of enforcing referential integrity on a series of binary files or text files. As I always say "we no longer live in a one-app-one-database world". At the very least an SQL client will issue DMLs besides "the app". So the database should defend itself from human or programming errors to a reasonable extent

In those well known type of requirements where RDBMS won't scale well, by all means embrace no-SQL technology. But it is worrying the proliferation of relational databases with no constraints where thousands of lines of code (generated or typed) devoted to enforce what the RDBMS should be enforcing for you in more efficient ways.

There are external constraints that drive technology decisions. There's just few situations where you have the need and or luxury of using database field constraints on a regular basis.

Enterprises have developers for both apps and database along with DBA, but most developers do not work in this type of environment. They do as much as they can in code. Also, some on the database side don't get involved in the business rules. They primarily are there to keep things running. They'll never push for constraints in the db. Having to deal with legacy apps, integrations, migrations, mergers, acquisitions a db constraint may be the best solution.
Overloading the db can create a bottleneck that isn't easily solved by throwing more machines at the problem. There are some situations where the db language doesn't handle some programming problems without a major performance hit, so you can't plan on using a constraint for everything. Stackoverflow has one database server because throwing 2 at a problem is a challenge.
Automated Testing - they're getting there but many db developers are late to the party along with the IDE/testing frameworks.
Deployment - more db stuff makes it more complicated. What happens when an update to a client's database isn't allowed because there are data that violate the constraint? Game over unless you have a way to address this. In your app, you may decide to let the user handle this as needed or instruct some admin to do it in a batch.
Only the app/api/service will ever write data to the database so why bother? This does hold up most of the time which is why it's not common.
Handling db errors is hard enough without hundreds of constraint violations to contend with if everything gets out of whack.Most are happy making a connection and getting the table name correct.

Many development teams do not want to give too much control to a db developer. You're lucky if you get more than one, so vacations are a lot of fun. Not many require absolute control over the database domain and take responsibility for every query, business rule, performance, availability, security, and what data go to what RAID. Here are the stored procedures you are allowed to execute. Have fun. Don't even think about touching a table.

This is a problem that I have struggled with all my career (nearly 40 years) and also when writing my DBMS. A description of my end point is here: http://unibase.zenucom.com. So here are my thoughts.

Generally speaking most constraints are better handled in the application so that different parts of the application can enforce different constraints. eg a state code might not apply in all jurisdictions.
As an aside beware of %. Markups are > 100% or you go broke :)
Constraints are best described negatively. ie what they can't be, not what they should be. It's always a simpler list.
Foreign keys are always good and should be used. Fullstop. FK is one of the few semantic constructs in a RDBMS and very useful. Biggest difficulty is deciding whether to let a value dangle if the FK is removed or to use dependent rows as a reason not to delete the FK record.
Constraints in the real world are usually more complex than a single field value restriction.
Some constraints, even at the application level, work against good operations. eg aggressive date checking hides errors in apparently good dates. You need operator error to get a measure of errors in otherwise sensible looking dates.

Database constraints might have been a smart idea, but what about a practical use for them? Take your percentage constraint. If you apply that, your DB will happily reject invalid percentages. And then? You will need business logic to handle the exception. Which actually means that the business logic writing a wrong percentage already failed elsewhere. So in the short: the only practical constraint left are those you see (like PK/FK).

More often these days, people are using software (e.g. Entity Framework) to generate tables and columns automatically. The idea is that they do not need SQL skills, freeing up brain capacity.

Expectations that software will "work things out" are often unrealistic, and it doesn't create the constraints that a human would.

For best results, create tables using SQL and add constraints manually, but sometimes people cannot do this.

许可以下： CC-BY-SA 和归因

不隶属于 softwareengineering.stackexchange