Question

I have a database with 4 tables, each with around 20 columns. I am to determine whether or not the whole DB conforms to BCNF. I assume the first step is to find functional dependencies but I am not sure how to do that, there is so many columns! What would the correct approach be?

Was it helpful?

Solution

The easiest way to do this would be to buy a computer-aided software engineering (CASE) tool that supports reverse-engineering. These tools can examine a SQL database, and generate every possible 5NF schema from the existing tables. I first saw tools like these in the 1980s, I think; I don't know which ones are still available.

If you do it yourself, then yes, you'd have to first determine all the functional dependencies. You have to know the meaning of each column to do that reliably. If you know the meaning of each column, and you know the business environment, you might be able to determine many of the function dependencies, if not all of them, simply by looking at the column names.

If you don't know the meaning of each column, the way forward is still simple, but it's not easy.

Simple: Using SQL, query each possible combination of columns for distinct values, and count the rows. If there's a functional dependency between two sets of columns, they'll return the same number of rows. You still have to look at the meaning; you might get the same number of rows by coincidence.

Not easy: There are a lot of combinations of columns. For a table of 20 columns, there are about a million combinations. (If I had to do this, I'd write a program to generate all the SQL statements, and store their results in a table for later analysis.)

The harder part

In the general case, you have no way of knowing whether there are functional dependencies between tables that, up until now, have been created and maintained by application code. (Your particular case might be different.) It could be that some of those columns are in the wrong table, but that application code is holding things together.

The phrase combinatorial explosion comes to mind. Four tables of 20 columns each, with no knowledge of the meaning or business environment, gives you about 1.2x1024 combinations to test. That's not practical; you have to reduce the scope by understanding the meaning of the columns, or just ignore the possibility of inter-table dependencies until you have a better structure to start with.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top