質問

I am doing unit tests on requests generators, and I get in trouble with LENGTH function.

I have 2 requests that follows each other :

SHOW VARIABLES LIKE '%character%'

Returns the following result :

array(8) {
  [0] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_client"
    'Value' =>
    string(4) "utf8"
  }
  [1] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_connection"
    'Value' =>
    string(4) "utf8"
  }
  [2] =>
  array(2) {
    'Variable_name' =>
    string(22) "character_set_database"
    'Value' =>
    string(6) "latin1"
  }
  [3] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_filesystem"
    'Value' =>
    string(6) "binary"
  }
  [4] =>
  array(2) {
    'Variable_name' =>
    string(21) "character_set_results"
    'Value' =>
    string(4) "utf8"
  }
  [5] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_server"
    'Value' =>
    string(4) "utf8"
  }
  [6] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_system"
    'Value' =>
    string(4) "utf8"
  }
  [7] =>
  array(2) {
    'Variable_name' =>
    string(18) "character_sets_dir"
    'Value' =>
    string(26) "/usr/share/mysql/charsets/"
  }
}

My second request is :

SELECT LENGTH('重庆') as len

It returns 6 instead of 2.

What's wrong here ? My charset parameters looks good.

役に立ちましたか?

解決

I found my answer in the MySQL documentation :

The LENGTH function counts bytes :

mysql> SELECT LENGTH('重庆') ;
+------------------+
| LENGTH('重庆')   |
+------------------+
|                6 |
+------------------+
1 row in set (0.00 sec)

The CHAR_LENGTH function counts characters :

mysql> SELECT CHAR_LENGTH('重庆') ;
+-----------------------+
| CHAR_LENGTH('重庆')   |
+-----------------------+
|                     2 |
+-----------------------+
1 row in set (0.00 sec)

他のヒント

They both work completely different:

Once LENGTH() returns always the length of the string by bytes. CHAR_LENGTH() is gonna return the length of the string by characters.

Once you are using Unicode, in which most characters are encoded in two bytes, It is always gonna be different. Or even when we are talking about UTF-8, where the number of bytes varies all the time.

e.g.:

SELECT LENGTH('重庆'), CHAR_LENGTH('重庆');
-->   6,  2  
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top