Question

I have a Client/Server architecture where messages in text-format are exchanged.

For example:

12  2013/11/11  abcd  5
^     ^          ^    ^
int  date      text  int

Everything works fine with "normal" text. Now this is a chinese project, so they also want so send chinese symbols. Encoding GB18030 or GB2312.

I read the data this way:

char[] dataIn = binaryReader.ReadChars(length);

then i create a new string from the char array and convert it to the right data type (int, float, string etc.).

How can I change/enable chinese encoding, or convert the string values to chinese? And what would be a good & easy way to test this. Thanks.

I tried using something like this

string stringData = new string(dataIn).Trim();
byte[] data = Encoding.Unicode.GetBytes(stringData);
stringData = Encoding.GetEncoding("GB18030").GetString(data);

Without success.

Also I need to save some text values to MS SQL Server 2008, is this possible - do I need to configurate anything special?

I also tried this example with storing to the database and printing to the console, but I just get ????????

string chinese = "123东北特钢大连新基地testtest"; 
byte[] utfBytes = Encoding.Unicode.GetBytes(chinese); 
byte[] chineseBytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding("GB18030"), utfBytes); 
string msg = Encoding.GetEncoding("GB18030").GetString(chineseBytes);

Edit The problem was with the INSERT queries, which I send to the database. I fixed it with using N' before the string.

sqlCommand = string.Format("INSERT INTO uber_chinese (columnName) VALUES(N'{0}')", myChineseString);

Also the column dataType has to be nvarchar instead of varchar.

Was it helpful?

Solution

This anser is "promoted" (by request from the Original Poster) from comments by myself.

In the .NET Framework, strings are already Unicode strings.

(Don't test Unicode strings by writing to the console, though, since the terminal window and console typically won't display them correctly. However, since .NET version 4.5 there is some support for this.)

The thing to be aware of is the Encoding when you get text from an outside source. In this case, the constructor of BinaryReader offers an overload that takes in an Encoding:

using (var binaryReader = new BinaryReader(yourStream, Encoding.GetEncoding("GB18030")))
    ...

On the SQL Server, be sure that any column that needs to hold Chinese strings is of type nvarchar (or nchar), not just varchar (char). Otherwise, depending on the collation, the column may not be able to hold general Unicode characters (it may be represented internally by some 8-bit Microsoft code page).

Whenever you give an nchar literal in SQL, use the format N'my text', not just 'my text', to make sure the literal is interpreted as an nchar rather than just char. For example N'Erdős' is distinct from N'Erdos' while, in many collations, 'Erdős' and 'Erdos' might be (projected onto) the same value in the underlying code page.

Similarly N'东北特钢大连新基地' will work, while '东北特钢大连新基地' might result in a lot of question marks. From the update of your quetion:

sqlCommand = string.Format("INSERT INTO uber_chinese (columnName) VALUES(N'{0}')", myChineseString);
                                                                         ↑

(This is prone to SQL injection, of course.)

The default collation of your column will be that of your database (SQL_Latin1_General_CP1_CI_AS from your comment). Unless you ORDER BY that column, or similar, that will probably be fine. If you do order by this column, consider using some Chinese language collation for the column (or for the entire database).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top