Question

I am using char_length() to measure the size of "Русский": strangely, instead of telling me that it's 7 chars, it tells me there are 14. Interestingly if the query is simply...

SELECT CHAR_LENGTH('Русский')

...the answer is correct. However if I query the DB instead, the anser is 14:

SELECT CHAR_LENGTH(text) FROM locales WHERE lang = 'ru-RU' AND name = 'lang_name'

Anybody go any ideas what I might be doing wrong? I can confirm that the collation is utf8_general_ci and the table is MyISAM

Thanks, Adrien

EDIT: My end objective is to be able to measure the lengths of records in a table containing single and double-byte chracters (eg. English & Russian, but not limited to these two languages only)

Was it helpful?

Solution

Because of two bytes is used for each UTF8 char. See http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_char-length

mysql> set names utf8;
mysql> SELECT CHAR_LENGTH('Русский'); result - 7
mysql> SELECT CHAR_LENGTH('test'); result  - 4

create table test123 (
text VARCHAR(255) NOT NULL DEFAULT '',
text_text TEXT) Engine=Innodb default charset=UTF8;

insert into test123 VALUES('русский','test русский');

SELECT CHAR_LENGTH(text),CHAR_LENGTH(text_text) from test123; result - 7 and 12

I have tested work with: set names koi8r; create table and so on and got invalid result. So the solution is recreate table and insert all data after setting set names UTF8.

OTHER TIPS

the function return it's anwser guided by the most adjacent charset avaiable
in the case of a column, the column definition
in the case of a literal, the connection default
review the column charset with:

SELECT CHARACTER_SET_NAME FROM information_schema.`COLUMNS` 
where table_name = 'locales'
and column_name = 'text'

be careful, it is not filtered by table_schema

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top