Question

I'm going to be reading and parsing the EML files dropped by the Microsoft SMTP service. I am a newbie to using the various stream classes. The implementation I have seen that parses these files uses a variation on System.IO.Stream to read byte by byte. However, it seems like these files should never be anything but text. Wouldn't it be better to use a StreamReader? And if so, is there any reason to use something other than the default (UTF-8) encoding?

Was it helpful?

Solution

They should be text, but they aren't always.

Emails can be 8 Bit or Binary encoded.

A StreamReader will work for about 99% of the emails you want to parse.

However, quite honestly, that's not the biggest problem.

The problem will be actually parsing and extracting the Mime content, according to Mime rules, along with using the correct characterset.

Although UTF-8 is a very large characterset, and it can be used to parse a majority of emails, you can still get corrupt content by attempting to parse emails with that characterset.

The best way to do this, is to actually read the email in a binary form, extract the characterset, then switch to reading the email using the specified characterset found in the headers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top