Question

I am working on the redesign of an old database which started off small and is now really bloated and slow due to years of quick fixes when system changes occurred. No matter how well it is designed this time, there will of course be unforeseen changes so I am looking for some general tips on how best to prepare for such changes along with general advice on whether or not I am on the right track. I am very new to the software development / database design world so please forgive me if there are some glaringly obvious issues here or I am being a little too vague.... I'm trying my best :)

To be specific;

A reservation will be made on a website. At the time of booking, some extras / requirements may be added, e.g. a carpark space is booked - user will indicate whether or not a disabled space is required. I am going create another table of 'DisabledSpacesRequired' which will have one column - the bookingIDs of those where a disabled space was required. Is this "better" than having a flag in the booking table indicating whether or not the space is required?

Similarly, a booking may be cancelled - so there will be a table of cancelled bookings. For searching later, would it be better to simply search the cancelled bookings table for the bookingID? Or have a flag in the booking table indicating whether or not it was cancelled? (The 'CancelledBookings' table will be necessary anyway but should a flag also be included?)

What has got me thinking about such issues is the fact that there seems to be lots of add-ons currently in the database - e.g. there is a 'Subscribers' table, and there is a 'SubscribersTwitterHandles' table which was added later - is it good practice to separate out types of subscribers in this way? Or add flags to the existing table?

I've had a look for some similar questions and going by Implementing Review flags in Databases; best practices I think it is best to separate out variables to prepare for changes that might be made in the future. (For example we might want to add some information related to the disabled parking space required.)

Hope I am clear - any advice is greatly appreciated.

Was it helpful?

Solution

There are lots of opinions about flags in databases. So the common answer is "well, it depends what you want your RBDMS to be doing".

The student information system I work with on a daily basis has a status flag in the base student table. The legal values are A - Active, I - Inactive, P - Pre-registered, and G - Graduated. There's no validation table or lookup table for this. It's hard-coded in the application. While relationally that's a problem, the application works perfectly. A student always has one and exactly one status, and there's no situation not covered by the existing status lists. You could add a regtb_status lookup table and add a foreign key constraint to the student registration table, but it doesn't add much to this application.

For your Booking example, I would have a current status field in the Booking table itself. I would prefer to use a character field so I could support the statuses that I know I might need: A - Active, C - Cancelled by Customer, I - Invalid, D - Deleted by Staff, etc. You can even allow the customer to have access to the validation table so they can create custom statuses if they want. It depends on the workflow you're envisioning and your customers want.

Elsewhere in the same system, there are a lot of status flag fields that are hard coded CHAR(1) fields that are Y - Yes and N - No. You probably should use your RDBMS's boolean types for these flags, but unless you're talking ridiculous numbers of records or need to worry about internationalization, it's not going to be an issue. These types of tables are typically also functioning as junction tables. For example, the table that relates students to contacts includes status flags for whether the contact is living with the student, the type of contact (guardian, emergency contact), what the contacts relationship is to the student (mother, father, aunt, etc.), whether or not that contact should have access to the student in the parent website, the order of priority of the contacts, whether the parent should receive report cards in the mail, etc. This particular table is somewhat cumbersome simply because there are over a dozen flag fields in this table, but the multiple flag options relationship type are completely configurable in validation/lookup tables within the application and the column names are, at least in part, self-documenting. From a report-writing standpoint that's invaluable.

We have a few fields that are stored in user-defined tables, which actually store everything in an EAV table in the DB. These cause a problem because, often, the particular EAV record doesn't exist until the school explicitly sets it. The application behaves as though null = No, but it can make writing reports and even searching in the application difficult. You can't look for field = 'N'. You have to look for field = 'N' OR field IS NULL. In the application's search system, you have to specify field <> 'Y' because it doesn't handle nulls well in all cases. This is very confusing for users that can't wrap their heads around three valued logic. It's also fairly irritating for a DBA because the best way to view the data, a view, is not easily updated.

In my experience, bitmasks are almost always incorrect. They're very cumbersome and expensive to query against, not self-documenting, and generally a tremendous pain in the tail. I would rather see a series of BIT/BOOLEAN or CHAR fields any day than a bitmask. If it has multiple attributes in a single field, it's going to be a tremendous problem.

For your SubscribersTwitterHandles question, I guess I'm a little confused. Why didn't they just add a column to the existing table? Is it a one-to-many relationship, or are there multiple Twitter Handle fields? Either your customers haven't given you their handle -- in which case it's explicitly '' -- or it's the handle they gave you.

I guess my real question from a design standpoint: Are we creating flags or tags? In my mind a flag is something that has a one-to-one relationship with an existing entity in the database. That entity might be the junction between two entities, or it might be on the entity itself, but it always has a non-null value.

Tags, on the other hand, are arbitrary, potentially many-to-one or many-to-many, and in most situations are completely defined by the customer as an ad hoc means to group records.

OTHER TIPS

I am trying to share my opinion from the perspective of Database designing,

  • Please try to think about your entities and it's properties. In relational database design properties map to the columns and entities map to the tables.
  • If you agreed if newly added subject can be an entity itself then it is better to create a new table for it and for relation with other you can use either foreign key relation or may be another table for keeping relation.
  • If you think that it can be solely another property of an existing entity then better to add a column to that table.

These are very basic database design techniques but people also sometimes do trade-offs other than doing this for easier coding/query. But I consider that might be a different story.

It depends :)

You have to understand how the data will be used. If you have trillions of tables for flags, your queries will contain lots of joins to retrieve all information.

If you do not want to search in those columns, then it could be a flag column (or a separate table for all the flags with several columns). You can store multiple flags in some RDBMS (MySQL's 'enum' and 'set' types for example). You can also store your flags in bitmasks (integers).

If you want to search these flags (and the flag is the primary filter) a separate table could help. Just join those tables and thats it, but with multiple search criterias it will be hard to implement. (Imagine that when you want to search for all records where the carpark flag or the disable slot is requested)

You can also store them in key-value 'pairs' (bookingId, flagType), this is useful, when there are custom flags to set.

Once more: understand your data and understand how your RDBMS works. You have to consider that you want to optimize for storage space or for other resources (cpu usage, memory, disk IO, etc). There always will be pros and cons. When you can not decide which implementation is the best, set up some test cases and measure the most important metrics to get more information.

EDIT: in your specific case, I think, these flags won't act as filters, so you can store them in a column (separate one for each or grouped in bitmasks).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top