Question

I use the Lotus Notes NotesMIMEEntity to parse and convert emails. NotesMIMEEntity has a property "Encoding" which should give me the encoding of the current message. Normally this works well, but for some message I get the (valid) result "none" or 1725.

Is there a default encoding I can use for decoding such messages such as quoted-printable? How can I determine in which format the message is in.

I tried the relavant RFCs but was unable to get any clear information. Another question here is doing an heuristic approach, is this really necessary?

Help is really appreciated.

Was it helpful?

Solution

If it's coming back with ENC_NONE (1725), that would suggest that the sending software omitted the Content-Transfer-Encoding header, or screwed it up in some way that made it unreadable.

Given that, you either have to assume it's not encoded at all, or you have to assume that the sending side might really have encoded it after all but forgotten to set the header. In the latter case, you might want to try to guess what econding it is -- i.e., the heuristic approach. Is that scenario really likely for the messages you're dealing with? I can't answer that for you. I also can't answer whether it's even really necessary for you to know the encoding. It depends on what your requirements are and what you're trying to do with the data. (E.g., if the requirement is that your Lotus Notes user must see it exactly how someone else who wasn't using Lotus Notes saw the message, the problem is that what someone else saw will depend on what assumptions the software they were using made! You can't really know!)

Frankly, I would just go with the former interpretation unless someone specifically showed me a message that was encoded but which had a bad Content-transfer-encoding header, and also managed to come up with a rational reason why the software should try to fix a message that was broken on the sending side.

BTW: bear in mind that 7bit, 8bit and binary all mean that the data is not encoded. The difference between them is just a 'hint' to the receiving system that if the data is re-transmitted via a different method, it might need to be encoded. In all of these cases, however, the right thing to do with data is to copy it without applying any transformation (unless of course you receive 8bit data but you're running in a 7 bit environment).

OTHER TIPS

For normal SMTP, 7bit would be the default Content-Transfer-Encoding. Specifically, to encode quoted-printable to 8bit you may use this PHP function:

string quoted_printable_decode ( string $str )
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top