문제

Certain mail clients allow for the sender to place images directly in the body of their email (instead of as a traditional attachment). When I receive one of these emails in my application, I need to be able to look at only the text/plain message body and determine that the sender embedded an inline image.

I'm trying to craft a RegEx to find image placeholders in the text/plain message body so I can swap them for <img> tags in my own HTML-enabled version of the message. (Wacky, I know, but this is the requirement).

The problem I'm finding is that the placeholders differ based on the sending mail client. For example, when sent from MS Outlook, the text/plain body of the multi-part message looks like this:

Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Check out this image:

[cid:image001.jpg@01CB50D4.769583B0]

Isn't it cool??

A similar message sent from Gmail is a little bit different:

Content-Type: text/plain; charset=ISO-8859-1

Check out this image:

[image: image001.jpg]

Isn't it cool??

The text/html body and image/jpeg part with the base64 encoded image follow.

Has anyone done any research on this before and compiled a list or built a RegEx specifically for this purpose?

I realize a more reliable way to achieve my goal is to look at the text/html portion of the message--which seems to be a bit more standardized from the few tests I've done--but unfortunately I don't have access to that in this scenario.

I'm using C#, if that matters to anyone.

Here's a list of text/plain image placeholders I've compiled thus far:

  • Gmail: [image: filename.jpg]
  • Outlook 2007: [cid:filename.jpg@01CB50D4.769583B0]
  • Thunderbird 3.0.7: none
도움이 되었습니까?

해결책

I'd suggest to go with html part. If you want to find just a placeholder in plain text part, this very simple regular expression should be sufficient (PCRE):

^\[.*\]$

At least this is what works for examples above. If you'd like to identify image name, a bit complicated expression would be required. Mind that, this will catch all lines starting with [ and ending with ] no matter what the contents are. If you'd like to limit regexp to some file types, try this:

^\[.*(\.jpg|\.jpeg|\.png|\.gif|\.bmp).*\]$i

Examples will work in Perl, since you didn't mention language...

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top