PROCEDURE ANALYSE suggests to turn timestamp into CHAR(19)
Question
Using Procedure Analyse()
under MySQL 5.6.30 on a table with about 4 million rows makes the following recommendations:
- change timestamp to char(19)
- tinyint to ENUM (I noticed it's overzealous with ENUMS in general)
- change varchar to char(N)
I can see the point of changing varchar
to char when appropriate, but the first two recommendations strike me as odd. Thoughts?
Solution
change timestamp to char(19)
String comparison is more streamlined and accurate that comparing a timestamp that may/may not include taking timezone into account. As a super bizzare example, see the StackOverflow post from Jon Skeet.
tinyint to ENUM (I noticed it's overzealous with ENUMS in general)
TINYINT
is a integer with the range -128 to 127
This gives you 256 distinct values that can be represented.
An ENUM
can represent 256 times more values (65536 distinct values).
If the number of distinct values your table has for the TINYINT
is huge (more that 10), PROCEDURE ANALYSE()
is simply suggesting ENUM
in the event the number of distinct values will rise in the future. Internally, an ENUM
will go from 1 byte to 2 bytes should the number of distinct values has to exceed 256 (See Data Type Storage Requirements
under the heading Storage Requirements for String Types
for more info).
If you never have to increase the number of distinct values, you can ignore this suggestion.
I beg you to never use ENUM
anyway : See my post Advantages and Disadvantages to using ENUM vs Integer types?
change varchar to char(N)
This depends on your need for speed or space
VARCHAR
is great for a table to be more compactCHAR
us better for speed when doing string comparison and processing- See my posts
Following this suggestion (or not) is totally at your discretion
OTHER TIPS
CHAR
has a nasty side. ANALYSE()
predates character sets, and its code was probably not updated to take into account that English text in CHAR(...) utf8mb4
wastes 3/4 of the space! Also, there was some utility of MyISAM row_format=fixed, but such is now useless in the default InnoDB.
Bottom line: ignore its advice about CHAR
unless you (1) really have fixed length strings and (2) have specified an appropriate CHARACTER SET
.
I would not change TIMESTAMP
to CHAR(19)
because (1) the explosion of space due to utf8, and (2) you would lose the timezone adjustment that exists currently in the table, and (3) CHAR(19)
is a lot bigger than the footprint of TIMESTAMP
(4 bytes previously, now 5-8), and (4) ANALYSE()
probably has not been updated to understand microseconds.
DATETIME
, on the other hand does not have the timezone issue. But points 1,3,4 still apply.