My thought is: "best" is almost always opinion-based, but still there are some common things that may be said
Using relational structure
Once you have an issue that not all pairs are valid - you have an issue - that you must store this information. Therefore, you need either to store which pairs are invalid or to store which pairs are valid. Your sample with additional table is completely valid in terms of relational DBMS. In fact, if we'll face such issue, it is near the only way to resolve it on database-design level. With it:
- You're storing valid pairs. That's as I've said: you have to store this information somewhere and here we are - creating new table
- You're maintaining referential integrity via
FOREIGN KEY
. So your data will always be correct and point to valid pair
What bad things may happen and how could this impact the performance?
To reconstruct full row, you'll need to use simple JOIN
:
SELECT
Bonza.id,
EventCategories.Subcategory,
EventCategories.Category
FROM
Bonza
LEFT JOIN EventCategories
ON Bonza.EventCategoryId=EventCategory.id
- Performance of this
JOIN
will be good: you'll do it be FK - thus, by definition, you'll get only INDEX SCAN
. It is about index quality (i.e. it's cardinality) - but in general it will be fast.
- How complex is one
JOIN
? It's simple operation - but it may add some overhead to complex queries. However, in my opinion: it's ok. There's nothing complex in it.
- You are able to change pairs with a simple changing of
EventCategories
data. That is: you can easily remove restrictions on prohibited pairs and this will affect nothing. I see this as a great benefit of this structure. However, adding new restriction isn't so simple - because, yes, it requires DELETE
operation. You've chosen ON DELETE RESTRICT
action for your FK - and that means you'll have to handle all conflicting records before adding new restriction. This depends, of course, from your application's logic - but think of it another way: if you'll add new restriction, shouldn't then all conflicting records be removed (because logic is saying: yes, they should)? If so, then change your FK to ON DELETE CASCADE
.
So: having additional table is simple, flexible and actually easy way to resolve your issue.
Storing in one table
You've mentioned, that you can use trigger for your issue. And that is actually applicable, so I'll show - that this has it's weakness (well, together with some benefits). Let's say, we'll create the trigger:
DELIMITER //
CREATE TRIGGER catCheck BEFORE INSERT ON Bonza
FOR EACH ROW
BEGIN
IF NEW.Subcategory = "z" && NEW.Category = "c" THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid category pair';
END IF;
END;//
DELIMITER ;
Obviously, we still have to store information about how to validate our pairs, but in this case we store invalid combinations. Once we'll get invalid data, we'll catch this inside trigger and abort our insert, returning proper user-defined errno (45000
) together with some explanation text. Now, what about complexity and performance?
- This way allows you to store your data as it is, in one table. This is a benefit: you'll get rid of
JOIN
- integrity is maintained by another tool. You may forget about storing pairs and handling them, hiding this logic in the trigger
- So, you'll win on
SELECT
statements: your data always contain valid pairs. And no JOIN
would be needed
- But, yes, you'll loose on
INSERT
/UPDATE
statements: they will invoke trigger and within it - some checking condition. It may be complex (many IF
parts) and MySQL will check them one by one. Making one single condition wouldn't help lot - because still, in worst case, MySQL will check it till it's end.
- Scalability of this method is poor. Every time you'll need to add/remove pair restriction - you'll have to redefine trigger. Even worse, unlike
JOIN
case, you'll not able to do any cascade actions. Instead you'll have to do manual handling.
What to chose?
For common case, if you don't know for certain - what will be your application conditions, I recommend you to use JOIN
option. It's simple, readable, scalable. It fits relational DB principles.
For some special cases, you may want to chose second option. Those conditions would be:
- Allowed pairs will never be changed (or will be changed very rare)
SELECT
statements will be done much, much more often, then INSERT
/UPDATE
statements. And also SELECT
statement performance will be in highest priority in terms of performance for your application.