relational database design (normalizing many-to-many mappings)

https://stackoverflow.com/questions/4229060

26-09-2019
|

Question

The following is an analogous (and simplified) example to the design question I'm facing:

Suppose you have students, classes, and grades. Students can be in many different classes. Each class has many different student. And every (student,class) pair has one grade.

Should I layout the database (mysql database) like:

Option 1)

students table - (student_id, student_name)
classes table - (class_id, class_name)
students_classes table - (student_class_id, student_id, class_id)
grades table - (student_class_id, grade)

Option 2)

students table - (student_id, student_name)
classes table - (class_id, class_name)
grades table - (student_id, class_id, grade)

Or should it be designed as something else? Option 2 seems simpler now, but in the future, I might need other statistics related to each (student_id,class_id) pair (in which case, option 1 seems a bit better? Option 1 still feels a bit overly complicated though).

What do you recommend? Thanks.

Solution

Option 3)

students table - (student_id, student_name)
classes table - (class_id, class_name)
students_classes table - (student_class_id, student_id, class_id, grade)

Grade being an attribute of student-class.

Unless Grade has the possibility of becoming a full-fledged entity. In which case:

Option 4)

students table - (student_id, student_name)
classes table - (class_id, class_name)
students_classes table - (student_class_id, student_id, class_id)
grades table - (grade_id, grade, student_class_id)

OTHER TIPS

I'd go for option 2 personally. There is nothing wrong with a composite primary key for grades and it capture the information you need in your data model.

In option 1, students_classes serves no purpose except to have a surrogate key.

Edit, after seeing other answers:

2NF: grade (non-key) depends solely on student/class (key)
3NF: does not apply. You have no non-key on non-key dependencies
BCNF: does not apply, you have one candidate key only

Option 2 is correct, except it should be called student_class, reflecting its n::n function, or Enrolment as an entity. and (student_id, class_id)is the PK.

Grade (as you have shown it) is a 1::1 dependency on that compound key (not on one or the other element), and on nothing else, therefore it is an attribute of student_class.

And thereforestudent_classis in 3NF.

If people did not start off by blindly sticking Idiot columns on everything that moved, as you did with Option 1, they would be able to understand the data better and thus normalise better. That (Idiot column in Option 1 as a starting point) interfered with your intuition that the(student_id, class_id) was the Identifier; no additional Idiot column with its additional index was necessary. Then when you got around to evaluatinggrade, its dependency on that PK is obvious.

Idiot columns damage the Relational capability of the database. If you have say three tables in a hierarchy, and you need to grab some columns from the top and bottom tables, you are forced to go through the middle table. If you had Relational Identifiers, instead of Idiot columns, you get from the bottom table to the top table with having to read the middle table.

It is only half true that there are so many joins in a "normalised" database. The full truth is, since the database is not correctly normalised, yes, you are forced into many more joins than are necessary. In a truly Normalised database, with the same tables, the code requires much less joins.

Here's a >Data Model for a College< from a recent assignment, simplified version.

>IDEF1X Notation< for those who need explanation of the symbols.

Note only one Surrogate Key is required.
- This is because in the alternative, (LastName+FirstName+Initials_BirthDate+BithDate) would be the Person PK, and that would be carried as FK in 5 child/grandchild tables, which is 81 bytes, and that is not sensible.
  .
See if you can appreciate that the Identifiers (solid lines) are carried through to the children and grandchildren; they have, and convey meaning
It would be stupid to add Surrogate Keys for TeacherId, StudentId, StaffId, when we have a perfectly good PersonId, which is the Foreign Key and already unique. (The columns are named as such to identify their roles.)
All Business Rules were implemented in DDL: FK Constraints; Check Constraints; Rules.
- Room has a 4-column Compound Key; Offering has a 3-column Compound Key; the two together eliminate double bookings.
- The Offering PK and the Student PK together form the PK for Enrolment (identical to this Question; the PKs are made up of different columns, that's all).

~~I'm a fan of third-normal form, where you have separate Student, Class and Grade tables and link them with many-to-many tables like ClassStudent and GradeClass.~~

~~But it depends on how you want to maintain it in the future. Ultimately it comes down to future extension and maintainability. Which is why I prefer 3NF.~~

EDIT

Axn's answer is much better than mine.

It all depends, really. Option 1 is probably the most robust way of doing this application; option 2 might get you there quicker for this iteration. Will the change from option 2 -> 1 be that painful in the future? How sure are you that you will need that extra flexibility?

I would recommend just going for option 1. The queries won't be that much more complicated and if you are using an ORM (like ActiveRecord for Rails, among many), then the difference is practically null.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow