Is It Okay to Replace a Reference ("Type") Entity FK with BOOLEAN Attributes That Represent Complete Set of Possible Values of That FK?

StackOverflow https://stackoverflow.com/questions/13982437

Pergunta

In attempting to optimize the physical data model by removing joins (denormalizing), I elected to take all of the possible values that a user might specify for CommEventPurposeType, implement them as BOOLEAN attributes in CommEventPurpose, and will ultimately discard the CommEventPurposeType table and its FK in CommEventPurpose.

I will subsequently use CHECK constraints to ensure only one BOOLEAN attribute can be TRUE for every instance of CommEventPurpose.

What are the performance and space tradeoffs of adopting this approach?

Platform: MySQL

Foi útil?

Solução

MySQL does not enforce CHECK constraints. The syntax for a CHECK constraint is accepted, and it is retained in the metadata as documentation; but MySQL does NOT enforce them. (Of course, you could use triggers to enforce that type of constraint yourself, using both a BEFORE INSERT and a BEFORE UPDATE trigger.)

But if you want only one value to be selected, then a much better option would be a single column of ENUM datatype. The ENUM datatype allows for only ONE value out of a predefined list of values to be assigned. And MySQL does enforce that.

(When the SQL mode of "strict" is not enabled, MySQL is a little lax; when an invalid value is assigned, rather than throw an exception, MySQL instead silently substitutes a "no value" placeholder.)

An ENUM is going to have considerable space savings in the row, as compared to separately stored boolean columns (however it is you are planning on implementing storage of a boolean type, whether it be a single character, or a TINYINT.)


You also asked about performance.

You'll get better performance with a single ENUM column than you will with individually stored "boolean" columns -- shorter row, fewer NULL indicators, more rows per block, an index on just one column, rather than on multiple columns, automatic enforcement of "ONLY one" vs. the overhead of invoking stored programs (triggers).


As far as design, using an ENUM datatype is perfectly acceptable vs. a foreign key to a lookup table, in particular, if you would otherwise usually be performing a join to the lookup table to retrieve a string value to display on the screen or report.

The caveat is: it's fine to eliminate a "lookup" table, as long as you are not eliminating an "entity" table. By "entity" table, I mean a table that holds rows which represent "a person, place, thing, concept or event, which can be uniquely identified, and is important to the business."

So, for example, a "status" column containing 'open','closed','pending','canceled','delayed', etc. is a perfect candidate for an ENUM, because these aren't individually identifiable "entities", unlike the real "entities" that we are truly concerned with: customers, orders, shipments, payments, et al.


FOLLOW-UP

There's no convenient mechanism for obtaining the list of valid values for an ENUM; in my experience, most developers prefer to have a table that they can run a "lookup" query against, following their normal pattern.

One thing I do add to the "lookup" table is seq (sequence) column, that specifies the order that things should be presented in a drop down list (because sometimes, the requirement is that they be listed in an order that is not alphabetical, and not easily derived from the stored string values.)

I've successfully implemented the ENUM datatype in place of a foreign key to a lookup table. It gives a slightly cleaner data model, (avoids the extraneously distracting and unnecessary relationship line drawn on the diagram), and improves performance of the application, because it avoids the JOIN to that lookup table. From the client side, it works just like a VARCHAR column, in terms of select/insert/update.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top