如何将UTF-8 byte []转换为字符串？

https://stackoverflow.com/questions/1003275

05-07-2019
|

题

我有一个 byte [] 数组，它从我碰巧知道的文件中加载 UTF-8 。在一些调试代码中，我需要将其转换为字符串。有没有一个班轮可以做到这一点？

在幕后，它应该只是一个分配和一个 memcopy ，所以即使它没有实现，也应该是可能的。

解决方案

string result = System.Text.Encoding.UTF8.GetString(byteArray);

其他提示

这种转换至少有四种不同的方式。

编码的GetString
，但如果这些字节包含非ASCII字符，则无法恢复原始字节。
BitConverter.ToString
输出为“ - ”分隔字符串，但没有.NET内置方法将字符串转换回字节数组。
Convert.ToBase64String
您可以使用 Convert.FromBase64String 轻松地将输出字符串转换回字节数组。
注意：输出string可以包含'+'，'/'和'='。如果要在URL中使用该字符串，则需要对其进行显式编码。
HttpServerUtility.UrlTokenEncode
您可以使用 HttpServerUtility.UrlTokenDecode 轻松地将输出字符串转换回字节数组。输出字符串已经是URL友好的！缺点是，如果您的项目不是Web项目，它需要 System.Web 程序集。

一个完整的例子：

byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters

string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1);  // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results

string s2 = BitConverter.ToString(bytes);   // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
    decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes

string s3 = Convert.ToBase64String(bytes);  // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes

string s4 = HttpServerUtility.UrlTokenEncode(bytes);    // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes

当您不知道编码时，从字节数组转换为字符串的一般解决方案：

static string BytesToStringConverted(byte[] bytes)
{
    using (var stream = new MemoryStream(bytes))
    {
        using (var streamReader = new StreamReader(stream))
        {
            return streamReader.ReadToEnd();
        }
    }
}

<强>定义：

public static string ConvertByteToString(this byte[] source)
{
    return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}

是：

string result = input.ConvertByteToString();

将 byte [] 转换为字符串似乎很简单，但任何类型的编码都可能会弄乱输出字符串。这个小功能正常运行，没有任何意外结果：

private string ToString(byte[] bytes) { string response = string.Empty; foreach (byte b in bytes) response += (Char)b; return response; }

使用（byte）b.ToString（＆quot; x2＆quot;），输出 b4b5dfe475e58b67

public static class Ext { public static string ToHexString(this byte[] hex) { if (hex == null) return null; if (hex.Length == 0) return string.Empty; var s = new StringBuilder(); foreach (byte b in hex) { s.Append(b.ToString("x2")); } return s.ToString(); } public static byte[] ToHexBytes(this string hex) { if (hex == null) return null; if (hex.Length == 0) return new byte[0]; int l = hex.Length / 2; var b = new byte[l]; for (int i = 0; i < l; ++i) { b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16); } return b; } public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare) { if (bytes == null && bytesToCompare == null) return true; // ? if (bytes == null || bytesToCompare == null) return false; if (object.ReferenceEquals(bytes, bytesToCompare)) return true; if (bytes.Length != bytesToCompare.Length) return false; for (int i = 0; i < bytes.Length; ++i) { if (bytes[i] != bytesToCompare[i]) return false; } return true; } }

还有类UnicodeEncoding，使用起来非常简单：

ByteConverter = new UnicodeEncoding(); string stringDataForEncoding = "My Secret Data!"; byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding); Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));

可替换地：

var byteStr = Convert.ToBase64String(bytes);

用于将字节数组 byteArrFilename 从文件读取转换为纯ascii C样式的零终止字符串的Linq one-liner将是这样的：用于读取旧文件索引表之类的内容档案格式。

String filename = new String(byteArrFilename.TakeWhile(x => x != 0) .Select(x => x < 128 ? (Char)x : '?').ToArray());

我使用'？'作为默认字符，用于此处不纯的ascii，但当然可以更改。如果你想确定你可以检测它，只需使用'\ 0'，因为启动时的 TakeWhile 确保以这种方式构建的字符串不可能包含<输入源中的代码>'\ 0'值。

BitConverter 类可用于将 byte [] 转换为 string 。

var convertedString = BitConverter.ToString(byteAttay);

BitConverter 类的文档可以在 MSDN

据我所知，没有一个给定的答案保证使用null终止的正确行为。在有人以不同的方式向我展示之前，我使用以下方法编写了自己的静态类来处理它：

// Mimics the functionality of strlen() in c/c++ // Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well static int StringLength(byte[] buffer, int startIndex = 0) { int strlen = 0; while ( (startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds && buffer[startIndex + strlen] != 0 // The typical null terimation check ) { ++strlen; } return strlen; } // This is messy, but I haven't found a built-in way in c# that guarentees null termination public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0) { strlen = StringLength(buffer, startIndex); byte[] c_str = new byte[strlen]; Array.Copy(buffer, startIndex, c_str, 0, strlen); return Encoding.UTF8.GetString(c_str); }

startIndex 的原因在我正在处理的示例中，我需要将 byte [] 解析为空终止字符串数组。在简单的情况下可以安全地忽略它

hier是你不必费心编码的结果。我在我的网络类中使用它并将二进制对象作为字符串发送。

public static byte[] String2ByteArray(string str) { char[] chars = str.ToArray(); byte[] bytes = new byte[chars.Length * 2]; for (int i = 0; i < chars.Length; i++) Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2); return bytes; } public static string ByteArray2String(byte[] bytes) { char[] chars = new char[bytes.Length / 2]; for (int i = 0; i < chars.Length; i++) chars[i] = BitConverter.ToChar(bytes, i * 2); return new string(chars); }

除了选定的答案之外，如果您使用的是.NET 35或.NET35 CE，则必须指定要解码的第一个字节的索引以及要解码的字节数：

string result = System.Text.Encoding.UTF8.GetString(byteArray,0,byteArray.Length);

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow