Question

What is the best way to parse the mail headers and get the email address in the "return-path" field using Python?

Return-Path: <bob@example2.com>
Date: Sat, 16 Feb 2013 14:14:32 -0500
Subject: Hello World!
From: Robert Jones <robert.jones@example2.com>
To: Steve <steve@example.com>

P.S. I am a bit of a Python n00b and this code will need to run on AppEngine.

Was it helpful?

Solution

First, you probably want to use the stdlib's email package to parse the message.

I'm not sure how you're getting the message—things like the stdlib imaplib or the gmail API give you a way to get headers separately from the body, but other methods may give you the entire message. Either way, you can pass the whole thing to the email.parser.HeaderParser to parse the headers and ignore anything else:

>>> from email.parser import HeaderParser

>>> msg = HeaderParser().parsestr(header) # or parsestr(msg) if you have the whole msg
>>> return_path = msg.get('Return-Path')

Now, return_path is the string "<bob@example2.com>", which you can just parse as an email address (or None, if there isn't one).

>>> from email.utils import parseaddr
>>> realname, emailaddr = parseaddr(return_path)

Now, realname is "", and emailaddr is 'bob@example2.com'.

The reason there are two parts is because this is also perfectly valid:

Return-Path: "Bob Example" <bob@example.com>

Now, this may not be quite right. Are you allowed to have two Return-Path headers? Or can the Return-Path header include multiple addresses? I can't remember. I could look it up in the relevant RFCs, but then I'd also have to do some searching to find out whether any popular clients violate these particular rules. I can't remember all of this. So, for convenience, I usually assume anything can be multiple-headers and multiple-values and do things this way:

>>> return_paths = msg.get_all('Return-Path')

This returns the list ["<bob@example2.com>"]. (If there are no Return-Path headers, you'll get an empty list, instead of None, this way.) And you can just parse that all at once, to get a list of name, address pairs instead of just one:

>>> from email.utils import getaddresses
>>> for realname, emailaddr in getaddresses(returnpaths):
...     print(realname, emailaddr)

And if it turns out that Return-Path only allows a single value, the same code just works as-is.

OTHER TIPS

You can use the split() function and then strip():

line = "Return-Path: <bob@example2.com>"
header, value = line.split(":")
value = value.strip()

PS If you need to get rid of the braces, just use the strip function again:

value = value.strip('<>')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top