What default encoding is used when using StreamWriter to write into a file with no Encoding parameter?

StackOverflow https://stackoverflow.com/questions/23298126

  •  09-07-2023
  •  | 
  •  

Вопрос

I had a situation where we produce a file for our client, and the file would contain some special characters like accented i or a (í, á) etc.

Originally, we used this code to open file for output:

Using sw As StreamWriter = New StreamWriter(fullpath, True)

However, the í and á would show up in the file as 2 character combinations of bytes with hex codes c3 ad for the í and c3 a1 for the á

We fixed the issue by enforcing the Windows1252 encoding when writing to the file (which is same as Encoding.Default, but according to MSDN we should NOT be using Encoding.Default):

Using sw As StreamWriter = New StreamWriter(fullpath, True, Text.Encoding.GetEncoding(1252))

Question: if Encoding.Default is not really a default encoding when no Encoding parameter was supplied, which encoding is the default default (pardon the pun) encoding?

Question2: probably the same answer as QUestion 1, what is the default default encoding for StreamReader if you don't specify Encoding parameter?

Это было полезно?

Решение

Well, you didn't really fix the issue. To get "c3 ad for the í" you must use Encoding.Utf8

Which is what StreamWriter is already using. However, it uses the Utf8Encoding constructor that takes the encoderShouldEmitUTF8Identifier argument and passes false. Otherwise known as the BOM (Byte Order Mark). The BOM tells the program that reads the file unequivocally what Unicode encoding is used. Sadly, Microsoft cannot force a BOM because the Unicode consortium, in a highly uncharacteristic moment of temporary insanity, made a BOM optional.

It probably works now because the program falls back to the system's default encoding when it can't find the BOM. You might have guessed correctly at 1252, it is common, but certainly no guarantee. Fix:

  Using sw As StreamWriter = New StreamWriter(fullpath, True, Encoding.Utf8)

Do beware the True argument you use. Which appends text to the file. If the file already contains text then you can't get the BOM added anymore. Also a rather nasty problem if the file got started with a different encoding, you certainly don't want to get a mix. Do everything you can to avoid having to use True.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top