How do you read a text file without losing odd characters?
Question
I would like to read a text file into an array of strings using System.IO.File.ReadAllLines. However, ReadAllLines strips out some odd characters in the file that I would like to keep, such as chr(187). I've tried some different encoding options, but that doesn't help and I don't see an option for "no encoding."
I can use FileOpen and LineInput to read the file without modification, but this is quite a bit slower. Using FileSystemObject also works properly, but I would rather not use that.
What is the best way to read a text file into an array of strings without modification in .net?
Solution
There's no such concept as "no encoding". You must find out the right encoding, otherwise you can't possibly interpret the data correctly.
When you say "chr(187)" what Unicode character do you mean?
Some encodings you might want to try:
- Encoding.Default - the system default encoding
- Encoding.GetEncoding(28591) - ISO-Latin-1
- Encoding.UTF8 - very common in modern files
OTHER TIPS
It sounds like you want to read the raw bytes.
Use File.ReadAllBytes
to read them into an array (don't do this for large files), or use a FileStream
to read chunks of bytes at a time.
The characters that were stripped out were at the beginning of the file. It turns out they were the byte order marks for UTF-8. File.ReadAllLines and File.ReadAllText strips out the byte order marks, while LineInput and FileSystemObject functions do not.
If I had explained in the question that the odd characters were at the file beginning, I imagine I would have gotten a quick answer. I'll give Jon Skeet credit for the best answer to the question I posed.