Question

I looking for a way to parse the body text part of multipart/alternative emails. I have currently have a perl script using the Email::Mime module, which parses text/plain and text/html correctly. Though the problem I have is that when I parse a multipart/alternative email the $part->body always returns empty. I have tried using $part->body_raw and it does return the text body though it includes the header which I need to omit.

Current output using $part->data_raw

--_000_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable 

Text Body 

Desired Output

Text Body

PERL Code

my ( $body, $text_body, $html_body, $multi_body );
for my $part (@parts) {

if ( $part->content_type =~ m!text/html! ) {
    my $hs = HTML::Strip->new( emit_spaces => 0 );
    $html_body .= $hs->parse( $part->body );
    print "Found HTML\n";
}
elsif ($part->content_type =~ m!text/plain!
    or $part->content_type eq '' )
{

    $text_body .= $part->body;
    print "Found TEXT\n";
}
elsif ($part->content_type =~ m!multipart/alternative!
    or $part->content_type eq '' )
{
    print "Found Multipart\n";
    $multi_body .= $part->body;     

}

Source

Content-Type: multipart/related;
boundary="_004_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_";
type="multipart/alternative"
MIME-Version: 1.0

--_004_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_
Content-Type: multipart/alternative;
boundary="_000_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_"

--_000_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Test Body
Was it helpful?

Solution

Multiparts contain multiple parts. Iterate over them:

use strict;
use warnings;
use Email::MIME;
use Data::Printer;
use feature qw/say/;

my $source = <<EOF;
Content-Type: multipart/related;
boundary="_004_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_";
type="multipart/alternative"
MIME-Version: 1.0

--_004_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_
Content-Type: multipart/alternative;
boundary="_000_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_"

--_000_47C8E15E8EEDCB4E94E891F9414C019A0CB5BDEE79DFW1MBX07mex0_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Test Body
EOF

my $msg = Email::MIME->new($source);

for my $part ($msg->parts) {
    if ($part->content_type =~ m!multipart/alternative!
            or $part->content_type eq '' )
        {
            say "Found Multipart"; 
            for my $subpart ($part->parts) {
                say $subpart->body;
            }
    }
}

Outputs:

C:\>perl test_mime.pl 
Found Multipart 
Test Body

OTHER TIPS

You need to recurse one level down. The "body" of the alternative part is a text/plain part which you need to retrieve and parse.

You cannot in general assume any particular structure, only that a multipart consists of one or more individual parts (which could themselves be multiparts recursively ad inf.) which typically you will want to traverse.

While multipart/alternative pretty clearly documents that you are expected to pick one of the member parts (perhaps guided by your platforms's capabilities, and/or your user's preferences) but occasionally multipart/mixed or multipart/related get used for the same purpose.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top