Database constraints - keep or ignore?

https://stackoverflow.com/questions/3700920

02-10-2019
|

Pergunta

When I was learning in university, they taught us the database fundamentals, basics and rules, and one of the most important rules is the constraints (primary key, foreign key), and how to make 1-m, 1-1, m-n relationships.

Now when I move to real business environment they tell me: you should forget all you have been taught; no constraints, all those relationships are logical, no primary keys, no foreign keys, you can make your constraints through the code.

I don't know who is right: what I learned in my academic life or what I will learn in my new real business life. What do you think?

Solução

I think the constraints help you to have clean data. Performance is sometimes improved. In some cases, the performance can get affected by having the constraints. However, the answer to that is not removing the constraints. You have something called "denormalization" to help you deal with the performance issues (provided that your queries are already optimized). You can always create denormalized summary tables in such scenarios.

Did the guys who told you to "forget what you learnt" also tell you that they have forgotten the traffic rules they learnt at the driving classes?

Outras dicas

If somebody told me to ignore keys and constraints on my databases, I would promptly ignore them and go about my business.

Primary keys, foreign keys, and constraints are there for a reason. Use them. They'll make your life easier and your database easier to understand (and, quite often, more performant).

The longer I work with databases, the more I appreciate constraints. In the long run, they save me a lot of time. Only trusted constraints ensure 100% validity of data.

I wrote a chapter on usage of constraints vs. other ways of ensuring data integrity, available as free download here

If you think that constraints are merely about primary keys and foreign keys, then in fact you haven't been taught much of the "fundamentals" to begin with.

I suggest you take a look at "An Introduction to Relational Database Theory" by Hugh Darwen, which is available freely online. And at least you get a genuine education about the "fundamentals" from that one.

Database constraints are a great idea when you have users accessing the database whom may corrupt your data(i.e. any user). I tend to keep Foreign key constraints in place to ensure that anyone editing data in my database is aware that other tables are relying on the data in the current table.

Well it's certainly true that there are very few ultimate truths out there, and it's also true that many commerically successful products eschew declared referential integrity (DRI).

On the other hand, if you care about the data in your database, there is almost no better way to safeguard the integrity of that data than through DRI.

If you leave it all up to the code on top, you're kind of banking on the hope that no one will ever access the database through any other means. If they do there will be nothing stopping data corruption (orphaned rows, inconsistent and illogical data).

What you learned isn't just academic stuff. But yes it's like Plato's Utopia at times. It's the perfect condition your database can be in, the ideal design. But that ideal design isn't always possible.

Constraints should be as close to DB data as possible. Think about it this way. What if you wrote your constraints in code and later you wanted to migrate to a different language/platform and an error cropped up in one of your constraints? It'd be disastrous. Things like PK, FK, constraints etc. are used widely. They've been used for more than 30 years now. So, they're not junk but in certain scenarios they're just not manageable. For example, if you're Google, you can't just rely on a relational model to give answers in milliseconds.

So based on requirements like speed and stability, we sometimes duplicate data too, we don't use PKs, or we don't establish relationships etc. But only when we're looking for something specific AND when we know what we'll lose by doing it that way.

In the end, relational model is still just a model. It's a way of representing things. A very successful way but it's not a godsend so in some cases it has to be compromised.

I've spent a lot of time fix ng the crap data produced by incompetents who think the constraints belong in the application code. If a database is designed without these required things, it will have bad data. Do a quick check of any system like this that has been running for several years and you will find orphaned records and missing required information etc.

When I was in class, we accessed tables with raw SQL, and there is a lot of raw SQL or the equivalent out there. In these cases, constraints are generally good.

However, there are systems that use databases as back ends, and these databases are only accessed by that particular software system. In this case, the software should keep track of the necessary relations and constraints, and the database serves as a redundant check that doesn't provide good feedback and lowers performance.

The database constraints are redundant because the attached system needs to maintain the constraints itself. The feedback is that a certain database constraint was violated. If a program is capable of dealing with such feedback, it's capable of doing its own checks. The performance cost should be obvious.

The constraints can still be useful on development or test systems, but when the system goes into production about all they can do is crash the system if something goes wrong, and that's usually exactly what you don't want to happen in a large system like that.

Primary keys are important. You need a way to uniquely ID rows.

But, its been my experience that if you properly encapsulate your database access within classes (ie, reading/writing objects to/from the db), constraints arent generally necessary. Yeah, I might use them if 50 different apps in 10 different languages were using the same database. But if its one app, or a common suite of apps sharing a source code base, I'd rather have all the database manipulation logic in one place to make the app more maintainable. Same goes for stored procedures, but they have the additional issue of portablity between db systems if you write code meant to handle a wide variety of databases.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow