Question

I have an object.

    fp = open(self.currentEmailPath, "rb")
    p = email.Parser.Parser()
    self._currentEmailParsedInstance= p.parse(fp)
    fp.close()

self.currentEmailParsedInstance, from this object I want to get the body of an email, text only no HTML....

How do I do it?


something like this?

        newmsg=self._currentEmailParsedInstance.get_payload()
        body=newmsg[0].get_content....?

then strip the html from body. just what is that .... method to return the actual text... maybe I mis-understand you

        msg=self._currentEmailParsedInstance.get_payload()
        print type(msg)

output = type 'list'


the email

Return-Path:
Received: from xx.xx.net (example) by mxx3.xx.net (xxx)
id 485EF65F08EDX5E12 for xxx@xx.com; Thu, 23 Oct 2008 06:07:51 +0200
Received: from xxxxx2 (ccc) by example.net (ccc) (authenticated as xxxx.xxx@example.com) id 48798D4001146189 for example.example@example-example.com; Thu, 23 Oct 2008 06:07:51 +0200
From: "example"
To:
Subject: FW: example Date: Thu, 23 Oct 2008 12:07:45 +0800
Organization: example Message-ID: <001601c934c4$xxxx30$a9ff460a@xxx>
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0017_01C93507.F6F64E30"
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
Thread-Index: Ack0wLaumqgZo1oXSBuIpUCEg/wfOAABAFEA

This is a multi-part message in MIME format.

------=_NextPart_000_0017_01C93507.F6F64E30
Content-Type: multipart/alternative;
boundary="----=_NextPart_001_0018_01C93507.F6F64E30"

------=_NextPart_001_0018_01C93507.F6F64E30
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit

From: example.example[mailto:example@example.com]
Sent: Thursday, October 23, 2008 11:37 AM
To: xxxx@example.com
Subject: S/I for example(B/L
No.:4357-0120-810.044)

Please find attached the example.doc),

Thanks.

B.rgds,

xxx xxx

------=_NextPart_001_0018_01C93507.F6F64E30
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:st1=3D"urn:schemas-microsoft-com:office:smarttags" =
xmlns=3D"http://www.w3.org/TR/REC-html40">

HTML STUFF till

------=_NextPart_001_0018_01C93507.F6F64E30--

------=_NextPart_000_0017_01C93507.F6F64E30
Content-Type: application/msword;
name="xxxx.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="xxxx.doc"

0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAYAAAAAAAAAAA EAAAYgAAAAEAAAD+////AAAAAF8AAAD///////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////////s pcEAI2AJBAAA+FK/AAAAAAAAEAAAAAAABgAAnEIAAA4AYmpiaqEVoRUAAAAAAAAAAAAAAAAAAAAA AAAECBYAMlAAAMN/AADDfwAAQQ4AAAAAAAAPAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//w8AAAAA AAAAAAD//w8AAAAAAAAAAAD//w8AAAAAAAAAAAAAAAAAAAAAAKQAAAAAAEYEAAAAAAAARgQAAEYE AAAAAAAARgQAAAAAAABGBAAAAAAAAEYEAAAAAAAARgQAABQAAAAAAAAAAAAAAFoEAAAAAAAA4hsA AAAAAADiGwAAAAAAAOIbAAA4AAAAGhwAAHwAAACWHAAARAAAAFoEAAAAAAAABzcAAEgBAADmHAAA FgAAAPwcAAAAAAAA/BwAAAAAAAD8HAAAAAAAAPwcAAAAAAAA/BwAAAAAAAD8HAAAAAAAAPwcAAAA AAAAMjYAAAIAAAA0NgAAAAAAADQ2AAAAAAAANDYAAAAAAAA0NgAAAAAAADQ2AAAAAAAANDYAACQA AABPOAAAaAIAALc6AACOAAAAWDYAAGkAAAAAAAAAAAAAAAAAAAAAAAAARgQAAAAAAABHLAAAAAAA AAAAAAAAAAAAAAAAAAAAAAD8HAAAAAAAAPwcAAAAAAAARywAAAAAAABHLAAAAAAAAFg2AAAAAAAA

------=_NextPart_000_0017_01C93507.F6F64E30--


I just want to get :

From: xxxx.xxxx [mailto:xxxx@example.com]
Sent: Thursday, October 23, 2008 11:37 AM
To: xxxx@example.com
Subject: S/I for xxxxx (B/L
No.:4357-0120-810.044)

Pls find attached the xxxx.doc),

Thanks.

B.rgds,

xxx xxx


not sure if the mail is malformed! seems if you get an html page you have to do this:

        parts=self._currentEmailParsedInstance.get_payload()
        print parts[0].get_content_type()
        ..._multipart/alternative_
        textParts=parts[0].get_payload()
        print textParts[0].get_content_type()
        ..._text/plain_
        body=textParts[0].get_payload()
        print body
        ...get the text without a problem!!

thank you so much Vinko.

So its kinda like dealing with xml, recursive in nature.

Was it helpful?

Solution

This will get you the contents of the message

self.currentEmailParsedInstance.get_payload()

As for the text only part you will have to strip HTML on your own, for example using BeautifulSoup.

Check this link for more information about the Message class the Parser returns. If you mean getting the text part of messages containing both HTML and plain text version of themselves, you can specify an index to get_payload() to get the part you want.

I tried with a different MIME email because what you pasted seems malformed, hopefully it got malformed when you edited it.

>>> parser = email.parser.Parser()
>>> message = parser.parse(open('/home/vinko/jlm.txt','r'))
>>> message.is_multipart()
True
>>> parts = message.get_payload()
>>> len(parts)
2
>>> parts[0].get_content_type()
'text/plain'
>>> parts[1].get_content_type()
'message/rfc822'
>>> parts[0].get_payload()
'Message Text'

parts will contain all parts of the multipart message, you can check their content types as shown and get only the text/plain ones, for instance.

Good luck.

OTHER TIPS

ended up with this

        parser = email.parser.Parser()
        self._email = parser.parse(open('/home/vinko/jlm.txt','r'))
        parts=self._email.get_payload()
        check=parts[0].get_content_type()
        if check == "text/plain":
            return parts[0].get_payload()
        elif check == "multipart/alternative":
            part=parts[0].get_payload()
            if part[0].get_content_type() == "text/plain":
                return part[0].get_payload()
            else:
                return "cannot obtain the body of the email"
        else:
            return "cannot obtain the body of the email"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top