Imaging
Don't use Graphics.DrawString
for unicode characters.
You should migrate to TextRenderer.DrawText
instead, for example:
TextRenderer.DrawText(e.Graphics, "こんにちは", this.Font,
new Point(10, 10), this.ForeColor, this.BackColor, flags);
The drawback is that you wont be able to specify a Brush
.
I have tested it. I think some else must be going on, because it seems to work for me. Here is my code:
private void Form1_Paint(object sender, PaintEventArgs e)
{
var text = " །༉ᵒᵗᵗ͟ᵋༀ 🐢 ͟ ͟ ͟ ͟ ͟ ͟🐢 ͟ ͟ ͟ ͟ ͟ ͟ ͟🐛 ͟ ͟ ͟ ͟ ͟ ͟🐢. . . ";
TextRenderer.DrawText(e.Graphics, "TextRenderer.DrawText" + text , this.Font,
new Point(10, 10), this.ForeColor, this.BackColor);
e.Graphics.DrawString("Graphics.DrawString" + text, this.Font,
new SolidBrush(this.ForeColor), new PointF(10, 30));
}
Note: Font is Arial Unicode MS 8.25pt
.
The output:
Encoding
Here is the original string, stored in UTF-8:
[rotten4pple] །༉ᵒᵗᵗ͟ᵋༀ 🐢 ͟ ͟ ͟ ͟ ͟ ͟🐢 ͟ ͟ ͟ ͟ ͟ ͟ ͟🐛 ͟ ͟ ͟ ͟ ͟ ͟🐢. . .
And here is the wrong string you are getting, stored in Windows-1252:
[rotten4pple] à¼à¼‰áµ’ᵗᵗ͟ᵋༀ 🢠͟ ÍŸ ÍŸ ÍŸ ÍŸ ͟🢠͟ ÍŸ ÍŸ ÍŸ ÍŸ ÍŸ ͟🛠͟ ÍŸ ÍŸ ÍŸ ÍŸ ÍŸðŸ¢. . .
And they are binary equal. This is the hexadecimal representation of the bytes for both strings:
5B 72 6F 74 74 65 6E 34 70 70 6C 65 5D 20 E0 BC
8D E0 BC 89 E1 B5 92 E1 B5 97 E1 B5 97 CD 9F E1
B5 8B E0 BC 80 EF A3 BF 20 F0 9F 90 A2 20 CD 9F
20 CD 9F 20 CD 9F 20 CD 9F 20 CD 9F 20 CD 9F F0
9F 90 A2 20 CD 9F 20 CD 9F 20 CD 9F 20 CD 9F 20
CD 9F 20 CD 9F 20 CD 9F F0 9F 90 9B 20 CD 9F 20
CD 9F 20 CD 9F 20 CD 9F 20 CD 9F 20 CD 9F F0 9F
90 A2 2E 20 2E 20 2E
Since this is a re-interpretation of the binary values and not a re-encoding, converting from one to the other with Encoding.Convert
in .NET is not viable. Instead you should get the binary representation of the string in the wrong encoding and read it as the correct encoding directly:
var text = cmd.AllArguments;
var bytes = Encoding.GetEncoding(1252).GetBytes(text);
text = Encoding.UTF8.GetString(bytes);
Notes
You have been asking for what encoding uses the API you are using by default. I'm not familiar with the API you are using... yet, there is risk that it depends on the configuration of the machine. You should look for an overload that allows you to specify that you are receiving an UTF-8 string.
The chances are that you are actually receiving a byte[]
anyway, so you can use Encoding.UTF8.GetString
directly on it. If you cannot specify the encoding, you should consider switching to send byte[]
instead, the purpose of this is to have more control over the encoding.
On that regard, don't use Encoding.Default
because it will be Extended ASCII for the language of the machine.
By the way, UTF-8 is a good choice for networking, not only because it is independent of the language and other regional configuration, but also because it is independent of byte order (endianness).