バイナリファイル VB.NET の 0x00

https://stackoverflow.com/questions/1353340

20-09-2019
|

質問

以下に更新されました

VB.NETのBinaryReaderを使用してバイナリファイルを読んでいます。ファイル内の各行の構造は次のとおりです。

    "Category" = 1 byte
    "Code" = 1 byte
    "Text" = 60 Bytes

    Dim Category As Byte
    Dim Code As Byte
    Dim byText() As Byte
    Dim chText() As Char
    Dim br As New BinaryReader(fs)

    Category = br.ReadByte()
    Code = br.ReadByte()
    byText = br.ReadBytes(60)
    chText = encASCII.GetChars(byText)

問題は、「テキスト」フィールドにパディングに使用されるいくつかのおかしな文字があることです。ほとんどが0x00のnull文字のようです。

エンコーディングによってこれらの 0x00 文字を取り除く方法はありますか?
それ以外の場合、chText 配列で置換を実行して 0x00 文字を削除するにはどうすればよいですか?結果のデータテーブルを XML にシリアル化しようとしていますが、これらの非準拠文字では失敗します。配列をループすることはできますが、置換の方法がわかりません。

アップデート：

ここまで私がここまで来ているのは、以下の皆さんの多大な助けのおかげです。最初の解決策は機能しますが、期待したほど柔軟ではありません。2 番目の解決策は、あるユースケースでは失敗しますが、より一般的です。

Ad 1) このサブルーチンに文字列を渡すことで問題を解決できます

    Public Function StripBad(ByVal InString As String) As String
        Dim str As String = InString
        Dim sb As New System.Text.StringBuilder
        strNew = strNew.Replace(chBad, " ")
        For Each ch As Char In str

            If StrComp(ChrW(Val("&H25")), ch) >= 0 Then
                ch = " "
            End If
            sb.Append(ch)
        Next

        Return sb.ToString()
    End Function

Ad 2) このルーチンはいくつかの問題のある文字を取り除きますが、0x00 では失敗します。これは MSDN から転載されたものです。 http://msdn.microsoft.com/en-us/library/kdcak6ye.aspx.

    Public Function StripBadwithConvert(ByVal InString As String) As String
        Dim unicodeString As String
        unicodeString = InString
        ' Create two different encodings.
        Dim ascii As Encoding = Encoding.ASCII
        Dim [unicode] As Encoding = Encoding.UTF8

        ' Convert the string into a byte[].
        Dim unicodeBytes As Byte() = [unicode].GetBytes(unicodeString)

        Dim asciiBytes As Byte() = Encoding.Convert([unicode], ascii, unicodeBytes)

        Dim asciiChars(ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length) - 1) As Char
        ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0)
        Dim asciiString As New String(asciiChars)

        Return asciiString
    End Function

解決

あなたがテキストのフォーマットを使用して、やみくもにあなたがヒットするものを知らなくても、何かを削除されるように、あるものを見つける必要がありますまずます。

の形式に応じて、文字を削除するためにさまざまな方法を使用します。

のみゼロ文字を削除するには

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) <> 0 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

最初のゼロの文字から配列の最後までのすべてを削除するには：

Dim len As Integer
While len < byText.Length AndAlso byText(len) <> 0
   len += 1
End While
strText = Encoding.ASCII.GetChars(byText, 0, len)

編集：
あなただけのASCII文字であることを起こる任意のジャンクを維持したい場合：

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) >= 32 And byText(pos) <= 127 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

他のヒント

のヌル文字は右パディング（すなわち、の終端）として使用される場合、通常の場合であるテキストは、これはかなり簡単である：

Dim strText As String = encASCII.GetString(byText)
Dim strlen As Integer = strText.IndexOf(Chr(0))
If strlen <> -1 Then
    strText = strText.Substr(0, strlen - 1)
End If

ではない場合でも、文字列の通常Replaceを行うことができます。あなたはバイト配列内の剪定をした場合、それは「きれい」わずかとなり、のの前に文字列に変換します。原則は、しかし、同じままです。

Dim strlen As Integer = Array.IndexOf(byText, 0)
If strlen = -1 Then
    strlen = byText.Length + 1
End If
Dim strText = encASCII.GetString(byText, 0, strlen - 1)

あなたは、データをロードするために構造体を使用することができます：

[System.Runtime.InteropServices.StructLayout(System.Runtime.InteropServices.LayoutKind.Explicit)]
internal struct TextFileRecord
{
    [System.Runtime.InteropServices.FieldOffset(0)]
    public byte Category;
    [System.Runtime.InteropServices.FieldOffset( 1 )]
    public byte Code;
    [System.Runtime.InteropServices.FieldOffset( 2 )]
    [System.Runtime.InteropServices.MarshalAs(System.Runtime.InteropServices.UnmanagedType.LPTStr, SizeConst=60)]
    public string Text;
}

あなたは、文字列のエンコーディングと合わせてUnmanagedType-引数を調整する必要があります。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow

バイナリ ファイル VB.NET の 0x00

バイナリファイル VB.NET の 0x00