Вопрос

I am doing unit tests on requests generators, and I get in trouble with LENGTH function.

I have 2 requests that follows each other :

SHOW VARIABLES LIKE '%character%'

Returns the following result :

array(8) {
  [0] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_client"
    'Value' =>
    string(4) "utf8"
  }
  [1] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_connection"
    'Value' =>
    string(4) "utf8"
  }
  [2] =>
  array(2) {
    'Variable_name' =>
    string(22) "character_set_database"
    'Value' =>
    string(6) "latin1"
  }
  [3] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_filesystem"
    'Value' =>
    string(6) "binary"
  }
  [4] =>
  array(2) {
    'Variable_name' =>
    string(21) "character_set_results"
    'Value' =>
    string(4) "utf8"
  }
  [5] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_server"
    'Value' =>
    string(4) "utf8"
  }
  [6] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_system"
    'Value' =>
    string(4) "utf8"
  }
  [7] =>
  array(2) {
    'Variable_name' =>
    string(18) "character_sets_dir"
    'Value' =>
    string(26) "/usr/share/mysql/charsets/"
  }
}

My second request is :

SELECT LENGTH('重庆') as len

It returns 6 instead of 2.

What's wrong here ? My charset parameters looks good.

Это было полезно?

Решение

I found my answer in the MySQL documentation :

The LENGTH function counts bytes :

mysql> SELECT LENGTH('重庆') ;
+------------------+
| LENGTH('重庆')   |
+------------------+
|                6 |
+------------------+
1 row in set (0.00 sec)

The CHAR_LENGTH function counts characters :

mysql> SELECT CHAR_LENGTH('重庆') ;
+-----------------------+
| CHAR_LENGTH('重庆')   |
+-----------------------+
|                     2 |
+-----------------------+
1 row in set (0.00 sec)

Другие советы

They both work completely different:

Once LENGTH() returns always the length of the string by bytes. CHAR_LENGTH() is gonna return the length of the string by characters.

Once you are using Unicode, in which most characters are encoded in two bytes, It is always gonna be different. Or even when we are talking about UTF-8, where the number of bytes varies all the time.

e.g.:

SELECT LENGTH('重庆'), CHAR_LENGTH('重庆');
-->   6,  2  
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top