سؤال

I was asked to use a database in which most of the primary keys, and other fields as well, uses char(n) to store numeric values with padding, for example:

product_id: char(8) [00005677]
user_id: char(6) [000043]
category_id: char(2) [05]

The reason they want to use it like that, is to be able to use characters (in the far future) if they want. However they have many rules based in numbers, for example, category_id from 01 to 79 correspond to a general category and from 80 to 89 is a special category and 90 to 99 is user defined category.

I personally think that using char(n) to store numbers is a bad practice. My reasons are:

  1. using char, " " != 0, 0 != 00, 05 != 5, 00043 != 000043, and so on. For that reason, the values have to be constantly checked (to prevent data corruption).
  2. If I pad a number: 0 -> 00, then I have to pay attention not to pad a character (A -> 0A)
  3. If characters are used, then ranges become strange, something like: from 01 to 79 and AB and RX and TZ and S, etc...
  4. Indexing numbers instead of chars result in a performance gain

I'm proposing to change it to decimal(n) with zerofill to make it more "error-proof", as this information is modified by different sources (web, windows client, upload csv). If they want to add more categories, for example, then updating from decimal(2) to decimal(3) will be easier.

My question then is: Am I wrong? can char(n) be trusted for this task? If "chars" are evil with numbers, then which other disadvantages am I missing in the above list (I may need better reasons if I want to win my case)?

TIA (any comment/answer will be appreciated).

هل كانت مفيدة؟

المحلول

If this was SQL Server or Oracle or any other RDBMS, I would recommend enforcing a check constraint on those columns so that the data always matched the full capacity of the column - this would ensure your identifiers are uniform.

Unfortunately MySQL doesn't support this.

While it wouldn't stop the annoyance of having to pad things coming into the database or in search routines, on the client or in procs in the database, it would guarantee you that the fields were clean at the lowest level.

I find using constraints like this help avoid things getting badly out of hand.

As far as the optimization by using numbers, if they have to accommodate non-numeric characters in the future, that's not going to be an option.

It is very common to have natural keys (which could be candidates for a primary key) with varchar/char data, but yet instead enforce referential integrity on surrogate keys (usually some kind of autonumbering integer which is simply an internal reference, and often the clustered index and primary key).

نصائح أخرى

Quoting your question:

...store numeric values with padding...

You did not show any examples of numeric data, only character data that happens to consist of numbers. If you had said that their OrderTotal column was a char(10), then I'd start to worry.

Just treat this as character data and you will be fine. I can see no business or technical case to change the database (unless you are beginning a near-total rewrite).

Regarding performance... If this is actually a concern, then you most likely have far bigger issues to deal with. MySQL is fast and accurate.

--

Write a function somewhere that will zerofill user inputted ID's for the purpose of querying. Use this function everywhere you need to accept user input. NEVER EVER use a numeric data type to store your data (if PHP, never use +, always use . to concat, etc...)

Remember, this is no different than Item_Number = "SHIRT123" or any other string ID you may encounter.

Take care

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top