Question

I have a file that uses non-ASCII characters, when saving this file using a filestream the characters that end up in the file are not as expected.

I write

stream
BT 38.3774 710 TD /F10 12.0000 Tf (België)Tj ET
endstream

what ends up in the file is

stream
BT 38.3774 710 TD /F10 12.0000 Tf (België)Tj ET
endstream

the strings are UTF8 encoded into bytes before using filestream.write to save them to the file.

Can someone help me understand why this happens ?

I have been abled to reproduce the result in a short version of the code

Using newFile As New FileStream("C:\Users\Sed\Documents\test.txt", FileMode.Create)
        Dim content As String = "België"
        Dim contentByte As Byte() = New UTF32Encoding().GetBytes(content)
        newFile.Write(contentByte, 0, contentByte.Length)
        contentByte = New UTF8Encoding().GetBytes(content)
        newFile.Write(contentByte, 0, contentByte.Length)
    End Using

giving the result

B   e   l   g   i   ë   België

so I expect that the filestream somehow assumes that its UTF32 encoded while the content of the file is being written in UTF8 ...

Encoding it all in UTF32 does not provide the answer. The file messes up completely then...

Still dont understand why this happens, but I have a workaround in my head that i need to explore.

Était-ce utile?

La solution

I have figured it out...

I you create a file the way i did, the encoding it uses is ANSI or encoding.Default

so changing

Dim newObjectByte As Byte() = New UTF8Encoding(True).GetBytes(DataObject("pdfObjectString").ToString())

to

Dim newObjectByte As Byte() = Encoding.Default.GetBytes(DataObject("pdfObjectString").ToString())

solved my problem with the codepage.

with the thanks of The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) that got me thinking about the codePage, ANSI ASCII and all that stuff...

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top