Database design - adding flags for exceptions/extras

Question 1

There are lots of opinions about flags in databases. So the common answer is "well, it depends what you want your RBDMS to be doing".

The student information system I work with on a daily basis has a status flag in the base student table. The legal values are A - Active, I - Inactive, P - Pre-registered, and G - Graduated. There's no validation table or lookup table for this. It's hard-coded in the application. While relationally that's a problem, the application works perfectly. A student always has one and exactly one status, and there's no situation not covered by the existing status lists. You could add a regtb_status lookup table and add a foreign key constraint to the student registration table, but it doesn't add much to this application.

For your Booking example, I would have a current status field in the Booking table itself. I would prefer to use a character field so I could support the statuses that I know I might need: A - Active, C - Cancelled by Customer, I - Invalid, D - Deleted by Staff, etc. You can even allow the customer to have access to the validation table so they can create custom statuses if they want. It depends on the workflow you're envisioning and your customers want.

Elsewhere in the same system, there are a lot of status flag fields that are hard coded CHAR(1) fields that are Y - Yes and N - No. You probably should use your RDBMS's boolean types for these flags, but unless you're talking ridiculous numbers of records or need to worry about internationalization, it's not going to be an issue. These types of tables are typically also functioning as junction tables. For example, the table that relates students to contacts includes status flags for whether the contact is living with the student, the type of contact (guardian, emergency contact), what the contacts relationship is to the student (mother, father, aunt, etc.), whether or not that contact should have access to the student in the parent website, the order of priority of the contacts, whether the parent should receive report cards in the mail, etc. This particular table is somewhat cumbersome simply because there are over a dozen flag fields in this table, but the multiple flag options relationship type are completely configurable in validation/lookup tables within the application and the column names are, at least in part, self-documenting. From a report-writing standpoint that's invaluable.

We have a few fields that are stored in user-defined tables, which actually store everything in an EAV table in the DB. These cause a problem because, often, the particular EAV record doesn't exist until the school explicitly sets it. The application behaves as though null = No, but it can make writing reports and even searching in the application difficult. You can't look for field = 'N'. You have to look for field = 'N' OR field IS NULL. In the application's search system, you have to specify field <> 'Y' because it doesn't handle nulls well in all cases. This is very confusing for users that can't wrap their heads around three valued logic. It's also fairly irritating for a DBA because the best way to view the data, a view, is not easily updated.

In my experience, bitmasks are almost always incorrect. They're very cumbersome and expensive to query against, not self-documenting, and generally a tremendous pain in the tail. I would rather see a series of BIT/BOOLEAN or CHAR fields any day than a bitmask. If it has multiple attributes in a single field, it's going to be a tremendous problem.

For your SubscribersTwitterHandles question, I guess I'm a little confused. Why didn't they just add a column to the existing table? Is it a one-to-many relationship, or are there multiple Twitter Handle fields? Either your customers haven't given you their handle -- in which case it's explicitly '' -- or it's the handle they gave you.

I guess my real question from a design standpoint: Are we creating flags or tags? In my mind a flag is something that has a one-to-one relationship with an existing entity in the database. That entity might be the junction between two entities, or it might be on the entity itself, but it always has a non-null value.

Tags, on the other hand, are arbitrary, potentially many-to-one or many-to-many, and in most situations are completely defined by the customer as an ad hoc means to group records.

Question 2

I am trying to share my opinion from the perspective of Database designing,

Please try to think about your entities and it's properties. In relational database design properties map to the columns and entities map to the tables.
If you agreed if newly added subject can be an entity itself then it is better to create a new table for it and for relation with other you can use either foreign key relation or may be another table for keeping relation.
If you think that it can be solely another property of an existing entity then better to add a column to that table.

These are very basic database design techniques but people also sometimes do trade-offs other than doing this for easier coding/query. But I consider that might be a different story.

Question 3

It depends :)

You have to understand how the data will be used. If you have trillions of tables for flags, your queries will contain lots of joins to retrieve all information.

If you do not want to search in those columns, then it could be a flag column (or a separate table for all the flags with several columns). You can store multiple flags in some RDBMS (MySQL's 'enum' and 'set' types for example). You can also store your flags in bitmasks (integers).

If you want to search these flags (and the flag is the primary filter) a separate table could help. Just join those tables and thats it, but with multiple search criterias it will be hard to implement. (Imagine that when you want to search for all records where the carpark flag or the disable slot is requested)

You can also store them in key-value 'pairs' (bookingId, flagType), this is useful, when there are custom flags to set.

Once more: understand your data and understand how your RDBMS works. You have to consider that you want to optimize for storage space or for other resources (cpu usage, memory, disk IO, etc). There always will be pros and cons. When you can not decide which implementation is the best, set up some test cases and measure the most important metrics to get more information.

EDIT: in your specific case, I think, these flags won't act as filters, so you can store them in a column (separate one for each or grouped in bitmasks).