如何将MySQL中的控制字符从Latin1转换为UTF-8?
-
16-10-2019 - |
题
在将数据库转换为UTF-8时,我注意到控制字符0x80-0x9f的奇怪行为。例如,使用此方法,0x92(右撇号)不会转换为UTF-8并截断列的其余内容:
CREATE TABLE `bar` (
`content` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1
INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 1 row affected (0.06 sec)
SELECT content FROM bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.06 sec)
ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected, 1 warning (0.06 sec)
Records: 1 Duplicates: 0 Warnings: 1
SHOW WARNINGS;
+---------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\x80\x81\x82\x83\x84\x85...' for column 'content' at row 1 |
+---------+------+-------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
SELECT * FROM bar;
+---------+
| content |
+---------+
| |
+---------+
1 row in set (0.06 sec)
虽然通常在Latin1中不允许0x80-0x9f,但MySQL似乎以不同的方式处理它:
MySQL的Latin1与Windows CP1252字符集相同。这意味着它与官方的ISO 8859-1或IANA(Internet分配的数字权限)Latin1相同,除了IANA LATIN1将代码点视为“未定义”,而CP1252,而Mysql的Latin1,则分配了CP1252对于这些职位。 SRC
但是MySQL似乎无法将上述值从其Latin1字符集转换为其UTF-8字符集。
这些字符正在从Word文档(CP1252)中从复制/粘贴中获取我的数据库,尽管我可能已经找到了一种使应用程序为新条目提供正确的UTF-8值的方法,但我需要确保旧的获取正确转换。
MySQL中是否有任何方法可以将它们转换为UTF-8等效物,而无需浏览每一行的每一行并用ASCII友好版本代替它们?
解决方案
我不确定。我试图开始复制您的问题,但Alter对我来说很好。
test > CREATE TABLE `bar` ( `content` text ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 0 rows affected (0.02 sec)
Query OK, 1 row affected (0.00 sec)
test > ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.04 sec)
Records: 1 Duplicates: 0 Warnings: 0
test > select * from bar;
+---------------------------------+
| content |
+---------------------------------+
| ����������������������������� |
+---------------------------------+
1 row in set (0.00 sec)
test > set names utf8;
Query OK, 0 rows affected (0.00 sec)
test > select * from bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
这是我相关的char设置
test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
编辑
在运行设置名称UTF8之前,我的字符设置
test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
版本
test > select version();
+-------------------------+
| version() |
+-------------------------+
| 5.1.41-3ubuntu12.10-log |
+-------------------------+
1 row in set (0.00 sec)
其他提示
在加载数据之前,您可能必须将字符设置转换为CP1250。
我首先跑了
mysql> show character set like 'cp%';
+---------+---------------------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------------------+-------------------+--------+
| cp850 | DOS West European | cp850_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
+---------+---------------------------+-------------------+--------+
8 rows in set (0.00 sec)
CP1252在这里不存在。最接近的是CP1250。
尝试此序列:
drop database if exists dtest;
create database dtest;
use dtest
set names cp1250;
CREATE TABLE `bar` (
`content` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
SELECT content FROM bar;
SHOW VARIABLES LIKE '%char%';
set names utf8;
SHOW VARIABLES LIKE '%char%';
ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
SELECT content FROM bar;
看看会发生什么。
我在Linux上的MySQL 5.5.19中得到了这个
mysql> drop database if exists dtest;
Query OK, 0 rows affected (0.00 sec)
mysql> create database dtest;
Query OK, 1 row affected (0.00 sec)
mysql> use dtest
Database changed
mysql> set names cp1250;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE `bar` (
-> `content` text
-> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT content FROM bar;
+---------------------------------+
| content |
+---------------------------------+
| ??
?????? |
+---------------------------------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | cp1250 |
| character_set_connection | cp1250 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | cp1250 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> SELECT content FROM bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------ŽÂÂâââââ---------------------------------------------------+
| â¬ÂâÆââ¦â â¡â°Å â¹Å ¢Å¡âºÅÂ
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
我在Windows 7机器上的Windows中在MySQL 5.5.12中得到了此信息
mysql> drop database if exists dtest;
Query OK, 1 row affected (0.00 sec)
mysql> create database dtest;
Query OK, 1 row affected (0.02 sec)
mysql> use dtest
Database changed
mysql> set names cp1250;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE `bar` (
-> `content` text
-> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
Query OK, 0 rows affected (0.06 sec)
mysql> INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT content FROM bar;
+---------------------------------+
| content |
+---------------------------------+
| Ç?é?äàåçëèï??Ä??æÆôöòûù?ÖÜ¢??₧? |
+---------------------------------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+---------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------+
| character_set_client | cp1250 |
| character_set_connection | cp1250 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | cp1250 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | C:\MySQL_5.5.12\share\charsets\ |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+---------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | C:\MySQL_5.5.12\share\charsets\ |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)
mysql> ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.06 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> SELECT content FROM bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
试试看 !!!
不隶属于 dba.stackexchange