Question

It's been a long time since I studied relational design, but I had a vague memory that it encourages not splitting a table unnecessarily. For instance, given the functional dependencies

K -> A
K -> B
K -> C

my assumption was that the "best" schema is just {KABC} and not something like {KAB, KC} or even {KA, KB, KC}. At least in practice that is how I've seen database designers implement the table.

However, a quick refresher on Wikipedia indicates that the normalization formalism

  • doesn't make any statement in the direction of obtaining a "minimal schema",
  • 6NF would even require {KA, KB, KC}. Since 6NF implies the other normal forms, it implies that it is even impossible for them to make such a minimal requirement.

I'm a bit confused that I got this wrong all the time. Does the notion of "obtaining a minimal number of tables" really play no role in formal relational design, and it is just common practice?

Was it helpful?

Solution

The “Normal Forms” are narrowly defined in terms of eliminating redundant data and “update anomalies”. Whether fixing other schema design problems counts as “Normalization” can be debated, but in general parlance Normalization just means ensuring the database complies with some Normal Form.

enter image description here

OTHER TIPS

Reducing the number of tables isn’t a goal, either in design theory or in practice. Reducing tables can help or hurt performance, which is why in practice people increase or reduce tables (regardless of duplicate data).

Roughly speaking, increasing the number of tables is helpful when you have data that is rarely used in conjunction, decreasing the number of tables is helpful when you have data that is frequently used in conjunction. Relational theory doesn’t care how fast or slow something is. Theory keeps you from getting into invalid states, practically speaking, some invalid states may be acceptable as long as everything comes out right in the end.

To add to the other answers about normalization, outside of theory on how a database should be structured to represent the data, there can be practical considerations that might make splitting a table sensible.

One possible implementation of ACID you would find in Postgres, for example, involves deleting and reinserting a row on update and later reclaiming the deleted rows through vacuuming. So if you have a table that might contain a lot of mostly-read data and a dirty bit to mark rows for processing, splitting the dirty bit into a separate table can significantly improve performance since UPDATEs need to rewrite a fraction of the data. It can also save you loads of disk space since the single table case would keep duplicating data until a VACUUM can reclaim the dead tuples.

Similar problems can occur when part of the data is so big it gets TOASTed, so you might want to split commonly accessed or updated fields into a separate table.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top