Question

Hypothetically, I have an ENUM column named Category, and an ENUM column named Subcategory. I will sometimes want to SELECT on Category alone, which is why they are split out.

CREATE TABLE `Bonza` (
   `EventId`     INT UNSIGNED NOT NULL AUTO_INCREMENT,
   `Category`    ENUM("a", "b", "c") NOT NULL,
   `Subcategory` ENUM("x", "y", "z") NOT NULL,

   PRIMARY KEY(`EventId`)
) ENGINE=InnoDB;

But not all subcategories are valid for all categories (say, "z" is only valid with "a" and "b"), and it irks me that this constraint isn't baked into the design of the table. If MySQL had some sort of "pair" type (where a column of that type were indexable on a leading subsequence of the value) then this wouldn't be such an issue.

I'm stuck with writing long conditionals in a trigger if I want to maintain integrity between category and subcategory. Or am I better off just leaving it? What would you do?

I suppose the most relationally-oriented approach would be storing an EventCategoryId instead, and mapping it to a table containing all valid event type pairs, and joining on that table every time I want to look up the meaning of an event category.

CREATE TABLE `Bonza` (
   `EventId`         INT UNSIGNED NOT NULL AUTO_INCREMENT,
   `EventCategoryId` INT UNSIGNED NOT NULL,

   PRIMARY KEY(`EventId`),
   FOREIGN KEY `EventCategoryId` REFEFRENCES(`EventCategories`.`EventCategoryId`)
     ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB;

CREATE TABLE `EventCategories` (
   `EventCategoryId` INT UNSIGNED NOT NULL,
   `Category`    ENUM("a", "b", "c") NOT NULL,
   `Subcategory` ENUM("x", "y", "z") NOT NULL,

   PRIMARY KEY(`EventCategoryId`)
) ENGINE=InnoDB;
-- Now populate this table with valid category/subcategory pairs at installation

Can I do anything simpler? This lookup will potentially cost me complexity and performance in calling code, for INSERTs into Bonza, no?

Was it helpful?

Solution

Assuming that your categories and subcategories don't change that often, and assuming that you're willing to live with a big update when they do, you can do the following:

Use an EventCategories table to control the hierarchical constraint between categories and subcategories. The primary key for that table should be a compound key containing both Category and Subcategory. Reference this table in your Bonza table. The foreign key in Bonza happens to contain both of the columns that you want to filter by, so you don't need to join to get what you're after. It will also be impossible to assign an invalid combination.

CREATE TABLE `Bonza` (
   `EventId`         UNSIGNED INT NOT NULL AUTO_INCREMENT,
   `Category`        CHAR(1) NOT NULL,
   `Subcategory`     CHAR(1) NOT NULL,

   PRIMARY KEY(`EventId`),
   FOREIGN KEY `Category`, `Subcategory` 
   REFEFRENCES(`EventCategories`.`Category`, `EventCategories`.`Subcategory`)
     ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB;

CREATE TABLE `EventCategories` (
   `EventCategoryId` UNSIGNED INT NOT NULL,
   `Category`    CHAR(1) NOT NULL,
   `Subcategory` CHAR(1) NOT NULL,

   PRIMARY KEY(`Category`, `Subcategory`)
) ENGINE=InnoDB;

OTHER TIPS

My thought is: "best" is almost always opinion-based, but still there are some common things that may be said

Using relational structure

Once you have an issue that not all pairs are valid - you have an issue - that you must store this information. Therefore, you need either to store which pairs are invalid or to store which pairs are valid. Your sample with additional table is completely valid in terms of relational DBMS. In fact, if we'll face such issue, it is near the only way to resolve it on database-design level. With it:

  • You're storing valid pairs. That's as I've said: you have to store this information somewhere and here we are - creating new table
  • You're maintaining referential integrity via FOREIGN KEY. So your data will always be correct and point to valid pair

What bad things may happen and how could this impact the performance?

  • To reconstruct full row, you'll need to use simple JOIN:

    SELECT 
      Bonza.id, 
      EventCategories.Subcategory,
      EventCategories.Category
    FROM
      Bonza
        LEFT JOIN EventCategories
        ON Bonza.EventCategoryId=EventCategory.id
    
  • Performance of this JOIN will be good: you'll do it be FK - thus, by definition, you'll get only INDEX SCAN. It is about index quality (i.e. it's cardinality) - but in general it will be fast.
  • How complex is one JOIN? It's simple operation - but it may add some overhead to complex queries. However, in my opinion: it's ok. There's nothing complex in it.
  • You are able to change pairs with a simple changing of EventCategories data. That is: you can easily remove restrictions on prohibited pairs and this will affect nothing. I see this as a great benefit of this structure. However, adding new restriction isn't so simple - because, yes, it requires DELETE operation. You've chosen ON DELETE RESTRICT action for your FK - and that means you'll have to handle all conflicting records before adding new restriction. This depends, of course, from your application's logic - but think of it another way: if you'll add new restriction, shouldn't then all conflicting records be removed (because logic is saying: yes, they should)? If so, then change your FK to ON DELETE CASCADE.

So: having additional table is simple, flexible and actually easy way to resolve your issue.

Storing in one table

You've mentioned, that you can use trigger for your issue. And that is actually applicable, so I'll show - that this has it's weakness (well, together with some benefits). Let's say, we'll create the trigger:

DELIMITER //
CREATE TRIGGER catCheck BEFORE INSERT ON Bonza
    FOR EACH ROW
    BEGIN
        IF NEW.Subcategory = "z" && NEW.Category = "c" THEN
            SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid category pair';
        END IF;
    END;//
DELIMITER ;

Obviously, we still have to store information about how to validate our pairs, but in this case we store invalid combinations. Once we'll get invalid data, we'll catch this inside trigger and abort our insert, returning proper user-defined errno (45000) together with some explanation text. Now, what about complexity and performance?

  • This way allows you to store your data as it is, in one table. This is a benefit: you'll get rid of JOIN - integrity is maintained by another tool. You may forget about storing pairs and handling them, hiding this logic in the trigger
  • So, you'll win on SELECT statements: your data always contain valid pairs. And no JOIN would be needed
  • But, yes, you'll loose on INSERT/UPDATE statements: they will invoke trigger and within it - some checking condition. It may be complex (many IF parts) and MySQL will check them one by one. Making one single condition wouldn't help lot - because still, in worst case, MySQL will check it till it's end.
  • Scalability of this method is poor. Every time you'll need to add/remove pair restriction - you'll have to redefine trigger. Even worse, unlike JOIN case, you'll not able to do any cascade actions. Instead you'll have to do manual handling.

What to chose?

For common case, if you don't know for certain - what will be your application conditions, I recommend you to use JOIN option. It's simple, readable, scalable. It fits relational DB principles.

For some special cases, you may want to chose second option. Those conditions would be:

  • Allowed pairs will never be changed (or will be changed very rare)
  • SELECT statements will be done much, much more often, then INSERT/UPDATE statements. And also SELECT statement performance will be in highest priority in terms of performance for your application.

I'd liked this problem but, with this information I would define a set of valid pairs for just one enum column:

CategorySubcategory ENUM("ax", "ay", "az", "bx", "by", "bz", "cx", "cy")

I think this will only be useful with a limited set of values, when they got bigger personally I would choose your second option rather than the triggered one. First reason is absolutely an opinion, I don't like triggers too much, and they don't like me Second reason is that a well indexed and properly sized reference from one table to another has a really high performance

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top