nullで終了する可能性のあるascii byte []を文字列に変換する最も速い方法は？

https://stackoverflow.com/questions/144176

02-07-2019
|

質問

（おそらく）nullで終了したASCIIバイトの配列をC＃の文字列に変換する必要があり、それを実現するための最速の方法は、次に示すUnsafeAsciiBytesToStringメソッドを使用することです。このメソッドは、備考に警告を含むString.String（sbyte *）コンストラクターを使用します。

＆quot; valueパラメーターは、デフォルトのANSIコードページ（つまり、Encoding.Defaultで指定されたエンコード方式）を使用してエンコードされた文字列を表す配列を指すと想定されます。

注： *デフォルトのANSIコードページはシステムに依存するため、このコンストラクターが同一の符号付きバイト配列から作成した文字列は、システムによって異なる場合があります。 * ...

*指定された配列がnullで終了していない場合、このコンストラクターの動作はシステムに依存します。たとえば、このような状況はアクセス違反を引き起こす可能性があります。 * ＆quot;

今、文字列のエンコード方法は決して変わらないことを確信しています...しかし、私のアプリが実行されているシステムのデフォルトのコードページは変わるかもしれません。だから、この目的のためにString.String（sbyte *）を使用して悲鳴を上げてはいけない理由はありますか？

using System;
using System.Text;

namespace FastAsciiBytesToString
{
    static class StringEx
    {
        public static string AsciiBytesToString(this byte[] buffer, int offset, int maxLength)
        {
            int maxIndex = offset + maxLength;

            for( int i = offset; i < maxIndex; i++ )
            {
                /// Skip non-nulls.
                if( buffer[i] != 0 ) continue;
                /// First null we find, return the string.
                return Encoding.ASCII.GetString(buffer, offset, i - offset);
            }
            /// Terminating null not found. Convert the entire section from offset to maxLength.
            return Encoding.ASCII.GetString(buffer, offset, maxLength);
        }

        public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)
        {
            string result = null;

            unsafe
            {
                fixed( byte* pAscii = &buffer[offset] )
                { 
                    result = new String((sbyte*)pAscii);
                }
            }

            return result;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            byte[] asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c', 0, 0, 0 };

            string result = asciiBytes.AsciiBytesToString(3, 6);

            Console.WriteLine("AsciiBytesToString Result: \"{0}\"", result);

            result = asciiBytes.UnsafeAsciiBytesToString(3);

            Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

            /// Non-null terminated test.
            asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c' };

            result = asciiBytes.UnsafeAsciiBytesToString(3);

            Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

            Console.ReadLine();
        }
    }
}

解決

String（sbyte *、int、int）コンストラクターを使用しない理由は何ですか？必要なバッファの部分を決定したら、残りは簡単なはずです：

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset, int length)
{
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, length);
       }
    }
}

最初に見る必要がある場合：

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset)
{
    int end = offset;
    while (end < buffer.Length && buffer[end] != 0)
    {
        end++;
    }
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, end - offset);
       }
    }
}

これが本当にASCII文字列（つまり、すべてのバイトが128未満）の場合、特に奇妙なデフォルトコードページがなければ、コードページの問題は問題になりません。 ASCIIに基づいています。

興味深いことに、実際にアプリケーションのプロファイルを作成して、これが本当にボトルネックであることを確認しましたか？より読みやすいものではなく、絶対的な最速の変換が必要ですか（例：適切なエンコーディングにEncoding.GetStringを使用）？

他のヒント

Oneliner（バッファには実際に1つの適切にフォーマットされたnull終了文字列が含まれていると仮定）：

String MyString = Encoding.ASCII.GetString(MyByteBuffer).TrimEnd((Char)0);

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestProject1
{
    class Class1
    {
    static public string cstr_to_string( byte[] data, int code_page)
    {
        Encoding Enc = Encoding.GetEncoding(code_page);  
        int inx = Array.FindIndex(data, 0, (x) => x == 0);//search for 0
        if (inx >= 0)
          return (Enc.GetString(data, 0, inx));
        else 
          return (Enc.GetString(data)); 
    }

    }
}

速度はわかりませんが、エンコードする前にLINQを使用してヌルを削除するのが最も簡単であることがわかりました：

string s = myEncoding.GetString(bytes.TakeWhile(b => !b.Equals(0)).ToArray());

s = s.Substring(0, s.IndexOf((char) 0));

検討すべき1つの可能性：デフォルトのコードページが受け入れ可能であることを確認し、その情報を使用して実行時に変換メカニズムを選択します。

これは、文字列が実際にヌル終端されているかどうかも考慮に入れることができますが、一度それを行うと、もちろん速度が私の消失します。

.NETクラスSystem.Text.Encodingを使用して、byte []オブジェクトをASCII相当物を含む文字列に、またはその逆に変換する簡単/安全/高速な方法。このクラスには、ASCIIエンコーダーを返す静的関数があります。

文字列からbyte []へ：

string s = "Hello World!"
byte[] b = System.Text.Encoding.ASCII.GetBytes(s);

byte []から文字列へ：

byte[] byteArray = new byte[] {0x41, 0x42, 0x09, 0x00, 0x255};
string s = System.Text.Encoding.ASCII.GetString(byteArray);

これは少しいですが、安全でないコードを使用する必要はありません：

string result = "";
for (int i = 0; i < data.Length && data[i] != 0; i++)
   result += (char)data[i];

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow