Why to have a join table for 1:m relation in SQL

https://stackoverflow.com/questions/1263486

13-09-2019
|

Question

What is the benefit of having junction tables between the first 1:m and the second 1:m relations in the following database?

alt text http://dl.getdropbox.com/u/175564/db/db-simple.png

The book Joe Celko's trees and hierarchies in SQL for Smarties says that the reason is to have unique relations in 1:m's. For instance, the following tables resrict users to ask the exactly same question twice and to give exactly the same answer twice, respectively.

The first 1:m relation

users-questions
===============
user_id REFERENCES users( user_id )
question_id REFERENCES questions ( question_id )
PK( user_id, question_id)           // User is not allowed to ask same question twice

The second 1:m relation

questions-answers
=================
question_id REFERENCES questions( question_id)
answer_id REFERENCES answers( aswer_id )
PK( question_id, answer_id )       //  Question is not allowed to have to same answers

This benefit about uniqueness does not convince me to make my code more challenging. I cannot understand why I should restrict the possibility of having questions or answers with the same ID in the db, since I can perhaps use PHP to forbid that.

Solution

Its usually due to duplication of data.

As for your reasoning, yes you can enforce this in the business layer, but if you make a mistake, it could break a significant amount of code. The issue you have is your data model may have only a few tables. Lucky you. When your data model grows, if you can't make sense of the structure and you have to put all the logic to maintain denormalised tables in your GUI layer you could very easily run into problems. Note that it is hard to make things threadsafe on a GUI for your SQL Database without using locking which will destroy your performance.

DBMS are very very good at dealing with these problems. You can keep your data model clean and use indexing to provide you with the speed you need. Your goal should be to get it right first, and only denormalise your tables when you can see a clear need to do so (for performance etc.)

Believe it or not, there are many situations where having normalised data makes your life easier, not harder when it comes to your application. For instance, if you have one big table with questions and answers, you have to write code to check if it is unique. If you have a table with a primary key, you simply write

insert into table (col1, col2) values (@id, @value) --NOTE: You would probably 
--make the id column an autonumber so you dont have to worry about this

The database will prevent you from inserting if you have a non unique value there OR if you are placing in an answer with no question. All you need to do is check whether the insertion worked, nothing more. Which one do you think is less code?

OTHER TIPS

Well, the unique relations thing seems nonsensical to me, probably because I'm used to DBMSes where you can define unique keys other than the primary key. In my world, mapping tables like those are how you implement a many-to-many relationship, and using them for a one-to-many relationship is madness — I mean, if you do that, maybe you intend for the relationship to be used as one-to-many, but what you've actually implemented is many-to-many support.

I don't agree with what you're saying about there being no utility to unique compound keys in the persistence layer because you can enforce that in the application layer, though. Persistence-layer uniqueness constraints have a lot of difficult-to-replicate benefits, such as, in MySQL, the ability to take advantage of INSERT ... ON DUPLICATE KEY UPDATE.

I agree that the join table for a one-to-many in this situation doesn't seem to add much benefit, and as @chaos says, you actually end up implementing many-to-many support. But Joe Celko is a smart guy - is this really the exact answer he gives?

One other possible reason for implementing a join table on a one-to-many is that it completely separates questions/answers from a dependence on users.

For example, say you added a Dogs tables and an Deities table. We all know that dogs can't register as users because they don't have email addresses, and gods don't register as users because, well, it's beneath them. Maybe dogs and gods still ask questions though, but to do that you might want to implement a dogs-questions table and a deities-questions table. In theory this is still many-to-many, but in practice you do it so that you can have multiple one-to-manys.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow