Minimal 3NF normalization issues

https://dba.stackexchange.com/questions/134644

01-10-2020
|

문제

I'm trying to learn how to normalize to 3NF but the example that my teacher gave us doesn't really help.

The algorithm I always used is this one:

Find a canonical cover F_c for F
For each FD X->Y in F_c create a relation with schema XY
Eliminate a relation if its schema is a subset of another
Merge those remaining relations into a relation which has all the keys for the db

I also tried the algorithm found in this other question but I'm still failing for this particular example.

So, here's the example (starting from the canonical cover):

R<A,B,C,D,E,F,H>
C->DF; A->H; HF->C; B->E; AD->B; E->D; AB->F
Keys = {AC, AF, AD, AB, AE}

Since we already have the canonical cover, let's follow the second step:

<R1(C,D,F), {C->DF}>
<R2(A,H), {A->H}>
<R3(C,F,H), {HF->C; C->F}>
<R4(B,E), {B->E}>
<R5(A,B,D), {AD->B; B->D; A->D}>
<R6(D,E), {E->D}>
<R7(A,B,F), {AB->F; F->B}>

Before going on, here's the first question: he takes some of the projections from the schema, but there's one that really doesn't make any sense to me. I'm talking about A->D from R5.

The paths leading to D are the ones with E or C, so let's see if we can use them:

C is reached by HF, where H is reached by A and F by AB, meaning that A can't get to C without B
E is reached by B, where B is reached by AD, meaning again that A can't get to E without D

So the first question is: why does he take that A->D projection if isn't there a way for A alone to reach D? Am I missing something?

Please note that the same question applies to F->B in R7, because I don't think that F can reach B without A, that's why I think it should be AF->B

Coming back to the example, let's go on with the algorithm. As we can see there are no relations sharing attributes, so there's nothing to merge/eliminate at step 3.

It's now time for step 4, but his result is the following (he didn't write the FDs here):

R1'(A,B,D,E) - merged R4, R5, R6
R2'(A,B,C,F,H) - merged R2, R3, R7
R3'(C,D,F) - R1 was not merged and didn't change

The problem here is that we're presented with this final result without any explanation on how we got here. He just merged those relations and I really don't understand why.

Can you please try to explain how this works?

해결책

A. Canonical Cover

The example is almost a canonical cover, since in a canonical cover each dependency has only an attribute on the right part, so the dependency

C → DF

must be replaced by the two dependencies:

C → D
C → F

B. Second step of the 3NF Synthesis Algorithm

The step says to divide the functional dependencies in groups with the same left part, and create a relation scheme with all the attributes of each group, not to create a relation for each functional dependency. In this case the effect is the same as your solution.

So the second step produces the following schemas:

R1 <(A H), { A → H }>    
R2 <(A B D), { A D → B }>    
R3 <(A B F), { A B → F }>    
R4 <(B E), { B → E }>    
R5 <(C D F), { C → D, C → F }>    
R6 <(D E), { E → D }>    
R7 <(C F H), { F H → C }>

Note that in this the decomposition the dependencies shown are only the original dependencies used to produce the decomposition, since in the algorithm the projection of the dependencies is not calculated, and this is due to the fact that this computation requires an exponential algorithm.

If, however, we look at the projection of the dependencies, here you are correct, since both A → D and F → B are not present in the projection. Here is the above decomposition together with a projection of the original dependencies over the subschema:

R1 <(A H), { A → H }>    
R2 <(A B D), { A D → B, B → D }>    
R3 <(A B F), { A B → F, A F → B }>    
R4 <(B E), { B → E }>    
R5 <(C D F), { C → D, C → F }>    
R6 <(D E), { E → D }>    
R7 <(C F H), { F H → C, C → F }>

C. Third step of the 3NF Synthesis Algorithm

Correct

D. Fourth step of the 3NF Synthesis Algorithm

Here again the step is incorrect. The fourth step simply says: “if no schema exists with a key of the original relation, than add a new schema with this key”. See for instance Algorithm 16.6, R.Elmasri, S.B. Navathe, Foundamentals of Database Systems, 6th Edition, pag. 561 (or any other good book on databases):

If none of the relation schemas in D contains a key of R, then create one more relation schema in D that contains attributes that form a key of R.

And since in this case there exist relations with a key, no other relation must be added, and the final decomposition is not modified:

R1 <(A H)>    
R2 <(A B D)>    
R3 <(A B F)>    
R4 <(B E)>    
R5 <(C D F)>    
R6 <(D E)>    
R7 <(C F H)>

So the solution given, with R1', R2' and R3' is not correct and it is not in Third Normal Form. It is easy to show this. Taking for instance the second relation,

R2' (A B C F H)

and projecting the original dependencies over it, we found the following dependencies:

{ A F → B
  A B → C
  A → H
  C → F
  F H → C }

with keys:

{ (A B) (A C) (A F) }

So we can see that the dependency A → H violates the Third Normal Form (A is not a (super)key and H is not a prime attribute).

As shown in the answer, if the relations must still be in 3NF, the final solution with three schemas is not the minimal 3NF normalization, since the relation R2' is not in 3NF. In other words, the solution is incorrect.

The classical synthesis algorithm produces the decomposition in 7 relations. I am not aware of any published algorithm that will minimize the number of relations obtained by that algorithm. You could try to combine relations and check if the combination is still in 3NF, but note that checking if a relation is in 3NF is an exponential task, since one should find all the keys, to exclude the possibility that a determinate be a prime attribute.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 dba.stackexchange