How do you read a text file without losing odd characters?

https://stackoverflow.com/questions/1804895

05-07-2019
|

Question

I would like to read a text file into an array of strings using System.IO.File.ReadAllLines. However, ReadAllLines strips out some odd characters in the file that I would like to keep, such as chr(187). I've tried some different encoding options, but that doesn't help and I don't see an option for "no encoding."

I can use FileOpen and LineInput to read the file without modification, but this is quite a bit slower. Using FileSystemObject also works properly, but I would rather not use that.

What is the best way to read a text file into an array of strings without modification in .net?

Solution

There's no such concept as "no encoding". You must find out the right encoding, otherwise you can't possibly interpret the data correctly.

When you say "chr(187)" what Unicode character do you mean?

Some encodings you might want to try:

Encoding.Default - the system default encoding
Encoding.GetEncoding(28591) - ISO-Latin-1
Encoding.UTF8 - very common in modern files

OTHER TIPS

It sounds like you want to read the raw bytes.

Use File.ReadAllBytes to read them into an array (don't do this for large files), or use a FileStream to read chunks of bytes at a time.

The characters that were stripped out were at the beginning of the file. It turns out they were the byte order marks for UTF-8. File.ReadAllLines and File.ReadAllText strips out the byte order marks, while LineInput and FileSystemObject functions do not.

If I had explained in the question that the odd characters were at the file beginning, I imagine I would have gotten a quick answer. I'll give Jon Skeet credit for the best answer to the question I posed.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow