Pregunta

I'm going to store data (mostly Wikipedia page titles) in a table, that can contain characters for which full UTF8 is needed. The schema I'm using is

CREATE TABLE `en_brands` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `name` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
 `name_encoded` varchar(255) NOT NULL,
 `inserted` datetime NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `name` (`name`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT

As you can see, name is only 191 characters long. When using larger values, then MySQL refuses the creation of UNIQUE KEY name because such keys can only be 767 bytes long. What are my questions:

  • has name_encoded to be at least TEXT to fully store an URL-encoded (in PHP rawurlencode) UTF8 string? (I think an 255 byte long string could be represented by a 3060 character long URL in worst case - 255 chars x 4 bytes x 3 chars for encoded representation)
  • does it matter which collation I use for name_encoded (I think not, because URL-encoded they should fit into latin)
  • which data type and collation should I use for name to store at least 255 characters with full UTF8 support and to create an UNIQUE KEY nevertheless (I'd like to use a collation which allows native language sorting)

BTW: I'm using MySQL 5.6 Percona on Debian Wheezy

¿Fue útil?

Solución

To answer my question:

has name_encoded to be at least TEXT to fully store an URL-encoded (in PHP rawurlencode) UTF8 string?

for sure, an URL encoded UTF8 string can be up to 3060 characters long, so TEXT is required

does it matter which collation I use for name_encoded (I think not, because URL-encoded they should fit into latin)

URL encoded string fits into ASCII

which data type and collation should I use for name to store at least 255 characters with full UTF8 support and to create an UNIQUE KEY nevertheless

this is not possible, because of length restriction for text columns in UNIQUE KEYs

I'm checking for duplicates by executing a SELECT first, so there is no need for an UNIQUE KEY, but application has to ensure data integrity

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top