Question

I created a database for nutritional table information. This database have values like carbohydrates, protein and fats of a food (there is a total of 28 fields).

The problem is that there are special cases where these values can be strings:

  • if the amount of a nutrient in the food is too small, the value will be "Tr" (trace).
  • If the measurement does not apply to that food, the value will be "NA"

and some other cases.

I am currently using strings to store the values, and converting them to numbers in my app. But there is a better way to do this?

Here is a example of what the table looks like:

id, kilocalories, carbohydrate_grams, lipid_grams, protein_grams

'1', '123.5348925', '25.80975', 'Tr', 'NA'

I am using MySQL 8.

Sorry, i think you needed a bit more of information.

I let NULL enabled, but there is no case that the field should be null. I just enabled it because this database can receive new data later, with other conventions. (I am using the brazilian database of foods, but I will insert the american database in the future).

Any field in the table can receive one of these values:

  • * means that the tests are being re-evaluated;
  • Empty fields means that no lab test was requested to that field;
  • NA means not applicable;
  • Tr means trace;

This is a dump of the real table with two lines of data:

--
-- Table structure for table `nutritional_table`
--

DROP TABLE IF EXISTS `nutritional_table`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `nutritional_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `kilocalories` varchar(255) DEFAULT NULL,
  `carbohydrate_grams` varchar(255) DEFAULT NULL,
  `lipid_grams` varchar(255) DEFAULT NULL,
  `saturados` varchar(255) DEFAULT NULL,
  `monoinsaturados` varchar(255) DEFAULT NULL,
  `poliinsaturados` varchar(255) DEFAULT NULL,
  `protein_grams` varchar(255) DEFAULT NULL,
  `dietary_fiber_grams` varchar(255) DEFAULT NULL,
  `ashes_grams` varchar(255) DEFAULT NULL,
  `cholesterol_milligrams` varchar(255) DEFAULT NULL,
  `calcium_milligrams` varchar(255) DEFAULT NULL,
  `magnesium_milligrams` varchar(255) DEFAULT NULL,
  `manganese_milligrams` varchar(255) DEFAULT NULL,
  `phosphorus_milligrams` varchar(255) DEFAULT NULL,
  `iron_milligrams` varchar(255) DEFAULT NULL,
  `sodium_milligrams` varchar(255) DEFAULT NULL,
  `potassium_milligrams` varchar(255) DEFAULT NULL,
  `copper_milligrams` varchar(255) DEFAULT NULL,
  `zinc_milligrams` varchar(255) DEFAULT NULL,
  `thiamine_milligrams` varchar(255) DEFAULT NULL,
  `riboflavin_milligrams` varchar(255) DEFAULT NULL,
  `pyridoxine_milligrams` varchar(255) DEFAULT NULL,
  `niacin_milligrams` varchar(255) DEFAULT NULL,
  `vitamin_c_milligrams` varchar(255) DEFAULT NULL,
  `retinol_micrograms` varchar(255) DEFAULT NULL,
  `RE_micrograms` varchar(255) DEFAULT NULL,
  `RAE_micrograms` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=598 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Dumping data for table `nutritional_table`
--

LOCK TABLES `nutritional_table` WRITE;
/*!40000 ALTER TABLE `nutritional_table` DISABLE KEYS */;
INSERT INTO `nutritional_table` VALUES
(1,'123.5348925','25.80975','1.000333333','0.3','0.4','0.3','2.58825','2.749333333','0.463','NA','5.204','58.702','0.627333333','105.853','0.262','1.244666667','75.15166667','0.020333333','0.682666667','0.08','Tr','0.08','Tr','','NA','',''),
(2,'359.678002','77.45071413','1.864833333','0.3','0.5','0.4','7.32328587','4.819166667','1.181333333','NA','7.818','109.71','2.993333333','250.865','0.948333333','1.645666667','173.34','0.074833333','1.395166667','0.261666667','Tr','0.175','4.183333333','','NA','','');
/*!40000 ALTER TABLE `nutritional_table` ENABLE KEYS */;
UNLOCK TABLES;

saturados, monoinsaturados and poliinsaturados are the same as the other fields. I had to translate this to english (they are written in portuguese) and add the suffix _milligrams to them. I just let to do fix this later.

About the abbreviations: I used the full names because I think it is more easily readable. I get a little confused with the abbreviations. But in the app I am showing the data using the abbreviations.

Was it helpful?

Solution

TL;DR

Store the numbers as numbers! For 'Trace', 'N/A' &c. have negative integers as codes and deal with the logic that way - CASE and company! OR, use GENERATED fields and let the server do all the conversions from VARCHAR to INT - never have to worry about a mistake again! Check out the use of a CASEstatement in a GENERATED column definition at the bottom of this answer!

I would do this in one of two ways:

First way (the better of the two - by far!):

You have:

CREATE TABLE nutritional_table 
(
  ..
  ..
  carbohydrate_grams varchar(255) DEFAULT NULL,

The gramme/gram is an SI unit: SI unit symbol: g

Anything with gramme/gram should be a pure number. The field/column containing it should be suffixed with _g.

Just as a matter of interest, should that not be ... (255) NOT NULL DEFAULT 0? It beats the COALESCE function - I always to try and avoid NULLs if possible!

monoinsaturados varchar(255) DEFAULT NULL,
poliinsaturados varchar(255) DEFAULT NULL,

Are these all numbers or percentages or what? If pure numbers or %, then store them as pure numbers, or with _pc as a percent suffix.

cholesterol_milligrams varchar(255) DEFAULT NULL,

one-millionth of a kilogram is 1 mg (one milligram),

The milligramme is a recognised unit, albeit not official SI. The suffix should be _mg!

The microgram is typically abbreviated "mcg" in pharmaceutical and nutritional supplement labelling, to avoid confusion, since the "μ" prefix is not always well recognised outside of technical disciplines

All microgram units should be suffixed with _mcg.

retinol_micrograms varchar(255) DEFAULT NULL,

  PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=598 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Why you go to the trouble of converting the units back and forth as strings is beyond me!

You should also maintain comments on your tables so that poor devs don't get confused and have no excuse for f**ing up! Make it part of SOP that when a field is added, an explanatory comment is added. Also, the tables themselves should have comments, possibly with references about where to find further information (example - of use later on!).

CREATE TABLE nutritional_content 
(
  nutrient_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY comment 'The Primary Key - doh!!!',
  test_measurement_gr VARCHAR (255) comment 'The weight of M&Ms in Outer Space'
) comment 'This table contains units of measurement as outlined in https://your_server/tables/nutritional_content.html';

Then, one can do stuff like this (table comments):

-- Learnt lots about comments on tables and columns - useful stuff!
-- Check out these links - upvote if you're feeling generous - I was!

-- https://stackoverflow.com/questions/5404051/show-comment-of-fields-from-mysql-table/5404118
-- https://stackoverflow.com/questions/3938966/how-can-i-access-the-table-comment-from-a-mysql-table
-- https://stackoverflow.com/questions/2162420/alter-mysql-table-to-add-comments-on-columns
-- https://dba.stackexchange.com/questions/59587/changing-mysql-table-comment

SELECT table_comment 
    FROM information_schema.tables 
--    WHERE table_schema='my_cool_database' -- fiddle - don't know schema_name
      WHERE table_name = 'nutritional_content ';

Result:

TABLE_COMMENT
This table contains units of measurement as outlined in https://your_server/tables/nutritional_content.html

Field comments:

SELECT column_name, column_type, column_default, column_comment
FROM information_schema.columns 
WHERE table_name =  'nutritional_content'
-- and `table_schema` = 'db-name'; -- fiddle - don't know schema_name

Result:

COLUMN_NAME COLUMN_TYPE COLUMN_DEFAULT  COLUMN_COMMENT
nutrient_id int(11)     The Primary Key - doh!!!
test_measurement_gr varchar(255)        The weight of M&Ms in Outer Space in grammes

Note the explanation of what units are being used, in case anybody is so slow that they don't realise that the _gr suffix is grammes!

For Trace, 'N/A' &c. - have a code table, say:

CREATE TABLE nutrition_code
(
  code_id SIGNED NOT NULL PRIMARY KEY (CHECK the_code < 0)
  the_code VARCHAR(255) NOT NULL
);

INSERT INTO nutrition_code VALUES (-1, 'Trace'), (-2, 'N/A');

So, I would imagine that most of the time, you're not too interested in traces or not-applicables? This might introduce a level of complexity, but I still think that it's a win-win compared to your current scenario. Take a look at the bottom of the fiddle for how to CASTthese - very hacky - but then it is MySQL! :-)

OK, so far, so good - if you want to overhaul your system, which I advise you to at least set as a long term goal, but better to start ASAP!

But, what you can do - if, say, for legacy reasons you have to keep your table the way it is while you change to a reasonable table structure is this!

The second way (temporary until code refactoring):

This would be to use GENERATED (or COMPUTED or sometimes called VIRTUAL) columns (although VIRTUAL is a misnomer as they may - in the case of MySQL such fields can be either VIRTUALor STORED(aka PERSISTENT)). As an aside, this is one of the very few ways in which MySQL is actually superior to PostgreSQL which can only have STORED (i.e. materialised) GENERATED columns for the moment, though I would always recommend PostgreSQL in preference to MySQL (personal opinion!).

What I would recommend then (as a temporary solution) is something like this (fiddle here - all the SQL above is also included in the fiddle):

From the MySQL documentation, the syntax is:

col_name data_type [GENERATED ALWAYS] AS (expr)
  [VIRTUAL | STORED] [NOT NULL | NULL]
  [UNIQUE [KEY]] [[PRIMARY] KEY]

To implement this, I had to CAST the VARCHAR(255) field to an INTEGER. I EVENTUALLY found the following solutions (how do you do it?):

-- This took me effing ages!

SELECT   CAST(test_measurement_gr               AS UNSIGNED) AS mes_num FROM nutritional_content;
SELECT      (`test_measurement_gr` * 1)         AS mes_num FROM nutritional_content;
SELECT      (`test_measurement_gr` + 0) + 1000  AS mes_num FROM nutritional_content;
SELECT SQRT((`test_measurement_gr` + 0) + 1000) AS mes_num FROM nutritional_content;

-- **Completely** Fu*ked if I kndw why MySQL does it this way, but c'est la vie!
-- Check out the links to the manuals in the link here:
-- https://stackoverflow.com/questions/12126991/cast-from-varchar-to-int-mysql


Result (1st and last SQL statements only):

The SQRT() function proves that it's acting as a proper number!

mes_num
     10
     12
     26

and

mes_num
31.78049716414141
31.811947441173732
32.03123475609393

So, now I'm ready to create my table with generated fields:

CREATE TABLE nutritional_content_bis 
(
  nutrient_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY comment 'The Primary Key - doh!!!',
  test_measurement_gr VARCHAR (255) comment 'The weight of M&Ms in Outer Space',
  test_measurement_gr_num INTEGER 
    GENERATED ALWAYS AS (CAST(test_measurement_gr AS UNSIGNED)) VIRTUAL  -- or STORED on disk
) comment 'This table contains units of measurement as outlined in https://your_server/tables/nutritional_content.html';

-- you can have it as persistent if you wish to avoid excessive calculations,
-- but you're doing these anyway. Tradeoff between CPU/RAM and HDD space!

Poplulate it:

INSERT INTO nutritional_content (test_measurement_gr)
VALUES
('10'), ('12'), ('26');

Then check (always check!):

SELECT 
  nutrient_id AS n_id,
  test_measurement_gr AS grammes_txt,
  test_measurement_gr_num AS gramme_num -- Just like any normal field!
FROM nutritional_content_bis;

Result:

n_id    grammes_txt gramme_num
   1    10          10
   2    12          12
   3    26          26

And a double check of our new numeric field (ROUND() and SQRT()):

-- To prove that they're really numbers!
SELECT ROUND(SQRT(test_measurement_gr_num), 2) AS num_test FROM nutritional_content_bis;

Result:

num_test
3.16
3.46
5.10

In addition to greatly simplifying your life, this approach should also be more performant. I would suggest that you perhaps try and do some tests on the different tables and see what you come up with.

For example, if you are asked to get the average positive ion content from Group 1 of the Periodic Table, you just have to add the Na and K content - with your current approach, you have get your raw data, convert it to a number, add it and obtain your result. A lot of work for nothing.

EDIT: - in response to comments beginning here.

saturados, monoinsaturados and poliinsaturados - Saturates (or saturated_fat), monounsaturates and polyunsaturates.

About the abbreviations you mentioned, like _g instead of grams, i used the full names because i think it is more easily readable. I get a little confused with the abbreviations.

This is why I placed stress on using table and field comments in your particular case.

I would like to upvote, but i cant do this yet

Your vote will be recorded and applied when you have sufficient reputation - I'll help - your question was interesting and provided a challenge, so +1!

I do not "feel so good" in using a value that should be the amount of that nutrient as a flag if it is negative

Neither do I "feel so good" about this particular solution, but it's probably the lesser of two evils - the other way of tackling this problem is to use the EAV anti-pattern, which opens a whole other can of worms! I've also written about it. EAV to a dev/progammer, is like heroin to a drug addict - things start great, then they spiral out of control until the situation is a complete mess. I always put this image into my mind if ever I'm tempted to use it, even for a "small" (they always grow) configuration table.

enter image description here

About the GENERATED fields, this way i will just lose the information of the special cases not? So it defeats the purpose of using strings at the first place. But, as you said, this is to be used as a temporary solution.

Maybe not so temporary? I didn't know this was possible, but you can do stuff like this (see fiddle - really thrilled I got this to work!!! - definitely a +1 for the question):

CREATE TABLE foo
(
  v1 VARCHAR (255) NOT NULL,
  v2 INTEGER       NOT NULL,
  v3 INTEGER 
    GENERATED ALWAYS AS 
    (
      CASE 
        WHEN v1 = 'Tr' THEN -1
        WHEN v1 = 'N/A' THEN -2  -- can add more - e.g. 'unknown', 'don''t care'...
        ELSE CAST(v1 AS UNSIGNED)
      END
    ) CHECK (v3 > -10 AND v3 < 100000) COMMENT 'The values "Tr" and "N/A" come from table_name...'
) COMMENT 'This table structure rocks!';

Then, perform some inserts:

INSERT INTO foo (v1, v2) VALUES ('123', 1), ('256', 2), ('Tr', -1), ('N/A', -2);

Check:

SELECT 
  v1, 
  v2, 
  v3, 
  ROUND(COALESCE(SQRT(v3), 0), 2) AS sqrt_neg, -- showing off! 
  ROUND(SQRT(ABS(v3)), 2) AS check_num
FROM foo;

Result:

v1  v2   v3 sqrt_neg    check_num
123  1  123    11.09        11.09
256  2  256    16.00        16.00
Tr  -1   -1     0.00         1.00
N/A -2   -2     0.00         1.41

So, with the CASE expression and the CHECK constraint (introduced in MySQL 8.0.16 - a mere 30 years after every other major database server had them!), you have reasonably fine-grained control over what goes into that field!

The only fly in the ointment is that your devs/programmers will have to remember to put the correct field values into the CASE statement - but, I think this is easier than your current situation (IMNSHO) - or using EAV! However, the upside is that it ONLY has to be done correctly ONCE - in the table definition - get that right and you never have to worry about inconsistent data again!

Finally, the icing on the cake:

INSERT INTO foo (v1, v2) VALUES ('shit', 45);

Result:

Truncated incorrect INTEGER value: 'shit'

So, only those strings which are meaningful will be allowed to be INSERTed into your field! In theory, if SQL was allowed in CHECK constraints, you would have even more control over your data. It's a pity that MySQL (and nobody else apart from Firebird) allows them, but they're probably on the way - but, hey, it only took MySQL 30 years to implement CHECK constraints so who knows?...

OTHER TIPS

  • You could, for the relevant columns, add another column each to hold potential string values and change the existing column to some number type. Whether or not that makes sense, does largely depend on your application. If you want to do calculations at the database layer (or right away at the application layer), actual numbers in the database would help.
  • If NULL, NA and empty strings are in your context essentially the same, use NULL consistently.
  • If TR is the only non-numeric value, switch, e.g., to

    riboflavin_milligrams numeric DEFAULT NULL, riboflavin_milligrams_trace TINYINT(1) DEFAULT 0,

  • You could still return either a number (converted to a string) or Tr, if needed:

    CASE riboflavin_milligrams_trace WHEN 1 THEN 'Tr' ELSE CAST(riboflavin_milligrams AS CHAR) END AS riboflavin

  • You might want to ensure one way or the other, there is no non-null value in the numeric column, if the trace column holds a 1 for TRUE

  • With the generated columns -thanks to @Vérace for the reminder- there is even the option to combine the string and numeric values already in the database for those cases, where the value is for display only:

    riboflavin_milligrams_char varchar(255) GENERATED ALWAYS AS (CASE riboflavin_milligrams_trace WHEN 1 THEN 'Tr' ELSE CAST(riboflavin_milligrams AS CHAR) END) VIRTUAL,

  • Given OP's preference for a single string column to store potential values for a particular attribute, a generated column could provide the resulting numbers (only - not added magic values):

    riboflavin_milligrams varchar(10) DEFAULT NULL, riboflavin_milligrams_number float GENERATED ALWAYS AS (CONVERT (riboflavin_milligrams, float)) VIRTUAL,

See it in action: db<>fiddle (Using your script with some adjustments.)

Please comment, if and as this requires adjustment / further detail.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top