how to distinguish “real” mail attachment from pics in html mail?
Question
I was messing around with OpenPop, a library written in C# for POP3 mail fetch. It seems to work OK, but I didn't quite get the idea how to make a difference between explicitly attached files to mail, and stuff like pictures in HTML-content mails. This library treats them all as "attachments". For my needs I wouldn't consider a picture within HTML mail an attachment.
From the library docs:
A MessagePart is considered to be an attachment, if
- it is not holding text and is not a MultiPart message or
- it has a Content-Disposition header that says it is an attachment
What should I do or search for, at least in theoretical terms (because I'm not really familiar with mail protocols)?
Solution
You could check if the image file is referenced in the body text of the email. You would need to parse the HTML and look for tags such as img
or the background-image
property in a CSS selector. If the image is not used by the message itself, then consider it to be a "genuine" attachment.
OTHER TIPS
I've done it this way and it seems to work:
foreach (OpenPop.Mime.MessagePart fileItem in elencoAtt)
{
System.Net.Mime.ContentDisposition cDisp = fileItem.ContentDisposition;
//Check for the attachment...
if (!cDisp.Inline)
{
// Attachment not in-line
}
else
{
// Attachment in-line
}
}
I am a developer on OpenPop.NET.
This is just some background information about emails contain attachments. Tony the Pony's answer is the way to go.
I also had the problem of distinguishing attachments from non-attachments when I was implementing that part of OpenPop.NET. MIME has a header, Content-Disposition
which is able to tell if a certain part is an attachment or not.
For example, here is an attachment
Content-Disposition: attachment
and here could be an image to a html part of the email
Content-Disposition: inline
How nice this may seem, the problem is that many email clients do not add these headers, making it hard for readers like OpenPop.NET. We choose not to look into all the HTML parts of an email to see which images are being refereed to, and therefore this is now up to the user of the library.
If you develop a good solution to the problem, it could be added as an example for the project.