Why attachment size given by Outlook programming interface is always wrong?

https://stackoverflow.com/questions/3078622

28-09-2019
|

Question

Trying to use Outlook Interop in C#, I noticed a curious thing.

First I get the size of an attachment with Attachment.Size property.
Second, I save the attachment to a file using Attachment.SaveAsFile method.

Comparing the real size of a saved file and a size given by Outlook, I notice that the real, saved file is always smaller than expected from Attachment.Size. The saved files seem to be valid and not truncated.

Sample results http://www.freeimagehosting.net/uploads/224d342eba.png

So, what's wrong with it? Is there a bug in Attachment.Size? Or maybe it is expected to give something other than the size of an attachment?

I thought it converts CR to CRLF, including binary files, which may explain the overhead, but some attached files are in raw text format with CRLF, so this hypothesis is wrong.

First edit:

It is not Base64 encoding, because Base64 encoding would be:

4/3 ratio. In my case, I have a ratio which is not so far from 1.0.
Proportional. It is not the case here: a 1.9 MB file has an overhead of 181 bytes, whereas a 27 KB file has an overhead of 3 KB.

Now, looking at nearly random overhead in a range of 89 to 3658 bytes, I would agree that it might be some strange headers.

Second edit:

I tested this on a larger set of files. What I notice is that the difference between real file size and size given by Outlook:

Is always zero for an .msg attachment. But .msg attachment is a very special case and have a very strange behavior.
Is influenced by both file extension and the length of file name.
For the same file extension, is, in most cases, but not always, bigger when the file name length is bigger.

Here is an example:

alt text http://www.freeimagehosting.net/uploads/a767d3cacf.png

IMHO, Outlook does something with the name of the file, some sort of very strange encoding, maybe a generation of an unique identifier based on file name. This means that:

when the file is bigger, the unique identifier is bigger too.
when collision happens, something happens to the unique identifier, making it much, much bigger: row 18 has the same file name as row 11, but the file is not the same; on the other hand, rows 12, 13 and 14 have the same file.

Solution

I'm not sure but I'd assume that it might be MIME headers and/or encoding overhead. For more information, look at this Wiki article about Base64 and search for the word overhead.

Edit: Sorry, I wasn't very clear, I meant the Base64 article just as an example of that there might be overhead related to encoding, not that it was actually Base64 since, as mentioned by others, Base64 overhead would probably be much larger than those differences.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow