Question

Using Procedure Analyse() under MySQL 5.6.30 on a table with about 4 million rows makes the following recommendations:

  1. change timestamp to char(19)
  2. tinyint to ENUM (I noticed it's overzealous with ENUMS in general)
  3. change varchar to char(N)

I can see the point of changing varchar to char when appropriate, but the first two recommendations strike me as odd. Thoughts?

Was it helpful?

Solution

change timestamp to char(19)

String comparison is more streamlined and accurate that comparing a timestamp that may/may not include taking timezone into account. As a super bizzare example, see the StackOverflow post from Jon Skeet.

tinyint to ENUM (I noticed it's overzealous with ENUMS in general)

TINYINT is a integer with the range -128 to 127

This gives you 256 distinct values that can be represented.

An ENUM can represent 256 times more values (65536 distinct values).

If the number of distinct values your table has for the TINYINT is huge (more that 10), PROCEDURE ANALYSE() is simply suggesting ENUM in the event the number of distinct values will rise in the future. Internally, an ENUM will go from 1 byte to 2 bytes should the number of distinct values has to exceed 256 (See Data Type Storage Requirements under the heading Storage Requirements for String Types for more info).

If you never have to increase the number of distinct values, you can ignore this suggestion.

I beg you to never use ENUM anyway : See my post Advantages and Disadvantages to using ENUM vs Integer types?

change varchar to char(N)

This depends on your need for speed or space

Following this suggestion (or not) is totally at your discretion

OTHER TIPS

CHAR has a nasty side. ANALYSE() predates character sets, and its code was probably not updated to take into account that English text in CHAR(...) utf8mb4 wastes 3/4 of the space! Also, there was some utility of MyISAM row_format=fixed, but such is now useless in the default InnoDB.

Bottom line: ignore its advice about CHAR unless you (1) really have fixed length strings and (2) have specified an appropriate CHARACTER SET.

I would not change TIMESTAMP to CHAR(19) because (1) the explosion of space due to utf8, and (2) you would lose the timezone adjustment that exists currently in the table, and (3) CHAR(19) is a lot bigger than the footprint of TIMESTAMP (4 bytes previously, now 5-8), and (4) ANALYSE() probably has not been updated to understand microseconds.

DATETIME, on the other hand does not have the timezone issue. But points 1,3,4 still apply.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top