Will column size change when trying to use single unicode character in place of multiple ASCII characters?

dba.stackexchange https://dba.stackexchange.com/questions/90550

Domanda

I have a column in utf8_unicode_ci collation for text type in MySQL.

The content is in Unicode, I have used tags like {br}, {v}, {t}, {h} etc.., throught the database. I am now thinking of changing these to single characters from Unicode table like ⓑ, ⓥ, ⓣ, Ⓗ and so on and so forth.

Will this have any effect on the size of the table? (Increase or decrease) Should I be worried about any other side effect? Will this have any effect if type of the column was varchar?

È stato utile?

Soluzione

"{br}" currently takes 4 bytes in ascii, latin1, utf8, utf8mb4 and perhaps other CHARACTER SETs. (The "COLLATION" is irrelevant.)

ⓑ is the 3-byte utf8 encoding (in hex) E2 93 91. See http://www.fileformat.info/info/unicode/char/24d1/index.htm

The size of the table will not change much. If those things are all that you have, then each {br} will shrink by 1 byte.

If this is an optimization, it is probably not worth doing.

The field (either TEXT or VARCHAR) must be declared CHARACTER SET utf8 (or utf8mb4). The client must be using utf8. You should probably do SET NAMES utf8.

There are obscure advantages to VARCHAR over TEXT. Note that the VARCHAR(100) can hold 100 characters; with CHARSET utf8, that is up to 300 bytes. {br} is 4 chars and 4 bytes; ⓑ is 1 char and 3 bytes. The (100) counts chars; disk space counts bytes.

See the following blog for too much more info: http://mysql.rjweb.org/doc.php/charcoll

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top