What is a good KISS description of Boyce-Codd normal form?

https://stackoverflow.com/questions/539324

22-08-2019
|

Question

What is a KISS (Keep it Simple, Stupid) way to remember what Boyce-Codd normal form is and how to take a unnormalized table and BCNF it?

Wikipedia's info: not terribly helpful for me.

Solution

Chris Date's definition is actually quite good, so long as you understand what he means:

Each attribute

Your data must be broken into separate, distinct attributes/columns/values which do not depend on any other attributes. Your full name is an attribute. Your birthdate is an attribute. Your age is not an attribute, it depends on the current date which is not part of your birthdate.

must represent a fact

Each attribute is a single fact, not a collection of facts. Changing one bit in an attribute changes the whole meaning. Your birthdate is a fact. Is your full name a fact? Well, in some cases it is, because if you change your surname your full name is different, right? But to a genealogist you have a surname and a family name, and if you change your surname your family name does not change, so they are separate facts.

about the key,

One attribute is special, it's a key. The key is an attribute that must be unique for all information in your data and must never change. Your full name is not a key because it can change. Your Social Insurance Number is not a key because they get reused. Your SSN plus birthdate is not a key, even if the combination can never be reused, because an attribute cannot be a combination of two facts. A GUID is a key. A number you increment and never reuse is a key.

the whole key,

The key alone must be sufficient [and necessary!] to identify your values; you cannot have the same data represented by different keys, nor can a subset of the key columns be sufficient to identify the fact. Suppose you had an address book with a GUID key, name and address values. It is OK to have the same name appearing twice with different keys if they represent different people and are not the "same data". If Mary Jones in accounting changes her name to Mary Smith, Mary Jones in Sales does not change her name as well. On the other hand, if Mary Smith and John Smith have the same street address and it really is the same place, this is not allowed. You have to create a new key/value pair with the street address and a new key.

You are also not allowed to use the key for this new single street address as a value in the address book since now the same street address key would be represented twice. Instead, you have to make a third key/value pair with values of the address book key and the street address key; you find a person's street address by matching their book key and address key in this group of values.

and nothing but the key

There must be nothing other than the key that identifies your values. For example, if you are allowed an address of "The Taj Mahal" (assuming there is only one) you are not allowed a city value in the same record, since if you know the address you would also know the city. This would also open up the possibility of there being more than one Taj Mahal in a different city. Instead, you have to again create a secondary Location key with unique values like the Taj, the White House in DC, and so on, and their cities. Or forbid "addresses" that are unique to a city.

So help me, Codd.

OTHER TIPS

Here are some helpful excerpts from the Wikipedia page on Third Normal Form:

Bill Kent defines Third Normal Form this way:

Each non-key attribute "must provide a fact about the key, the whole key, and nothing but the key."

Requiring that non-key attributes be dependent on "the whole key" ensures that a table is in 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures that the table is in 3NF.

Chris Date adapts Kent's mnemonic to define Boyce-Codd Normal Form:

"Each attribute must represent a fact about the key, the whole key, and nothing but the key." Here the requirement is concerned with every attribute in the table, not just non-key attributes.

This comes into play when a table has multiple compound candidate keys, and an attribute within one candidate keys has a dependency on a part of another candidate key. Third Normal Form wouldn't prohibit this, because it excludes key attributes. But BCNF applies the rule to key attributes as well.

As for how to make a table satisfy BCNF, you need to represent the extra dependency, with another attribute and possibly by splitting attributes into another table.

I googled "boyce codd normal form" and after wikipedia this is the second result. My textbook gives a very simple definition in terms of relational database management systems:

The left side of every nontrivial FD must be a superkey.

-"Database Systems The Complete Book" by Garcia-Molina, Ullman and Widom.

The best informal answer I've read is that, in BCNF, every "arrow" in every functional dependency is an "arrow" out of a candidate key. I don't recall the source, but it was probably something Chris Date wrote.

Basically Boyce-Codd is "fifth normal form". It is visually recognizable by the existance of "Attributive entities" in the data model, for things like Types (e.g. roles, status, process state, location-type, phone-type, etc). The attributive entities (sub-subtypes) are lists of finite sets of values that further categorize a class level entity. So you may have a phone-type ('mobile', ' desk', 'VOIP') email account type ('business', 'personal', 'gaming'), role (project manager, data modeler, super model) etc. Another morphological clue is the existance of super-types, (aka. master-classes, super-classes, meta-entities) such as Parties (subtypes being company, person, etc.).

It's basically Taxonomy gone wild (..no the video is not that exciting) to the atomic or leaf-level; see Bill Karwin's comment above for a more technical explanation.

Boyce-Codd level models are essentially highly detailed logical models, derived from more simplistic business-based conceptual models. **They are typically NOT implemented ver batim in the PHYSICAL model, because PDM optimization for performance (or functional simplicity) may result in the super-types and attributive entities being managed as drop-down lists in UIs, or in behind the scenes logic in the application, or in database constraints and methods to enforce referential integrity. (i.e. they may end up as look-up tables in the PDM schema, or they may be handled by code and not represented in the database).

So - why do them if they may not end up in the PDM? For the same reason you build a good 3NF model before you 'optimize', so that the database structure reflects the real world and is hence more stable than the typical kludges we inherit and have to do heroic acts to make work as our business/clients requirements change.

Often times it is easiest to listen to your gut and this will come naturally. Generally speaking, if you meet 3NF you have met BCNF. This doesn't cover detailed analysis of an ERD or have examples but there are thirteen rules according to Codd. I find it best to follow these rules but always remember there is no one correct way to do things so follow them loosely. So regarding the RDBMS, here are the rules:

http://www.87android.com/12-rules-of-relational-database-model-by-codd/

This may not answer the question directly, but if you are asking about how to get to BCNF or an easy way to remember it then you don't understand normalization well enough. That is of no concern though. Relational databases take many forms and very few are done well. The best thing you can do is know what it means to be relational, follow the rules above, and do not worry about the level of normalization. The process of normalization eliminates the duplication of data. Each level more so by moving into migration of functional dependencies. Keep that in mind and you will be fine, your gut and intellect will do the rest.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow