First, you probably want to use the stdlib's email
package to parse the message.
I'm not sure how you're getting the message—things like the stdlib imaplib
or the gmail API give you a way to get headers separately from the body, but other methods may give you the entire message. Either way, you can pass the whole thing to the email.parser.HeaderParser
to parse the headers and ignore anything else:
>>> from email.parser import HeaderParser
>>> msg = HeaderParser().parsestr(header) # or parsestr(msg) if you have the whole msg
>>> return_path = msg.get('Return-Path')
Now, return_path
is the string "<bob@example2.com>"
, which you can just parse as an email address (or None
, if there isn't one).
>>> from email.utils import parseaddr
>>> realname, emailaddr = parseaddr(return_path)
Now, realname
is ""
, and emailaddr
is 'bob@example2.com'
.
The reason there are two parts is because this is also perfectly valid:
Return-Path: "Bob Example" <bob@example.com>
Now, this may not be quite right. Are you allowed to have two Return-Path
headers? Or can the Return-Path
header include multiple addresses? I can't remember. I could look it up in the relevant RFCs, but then I'd also have to do some searching to find out whether any popular clients violate these particular rules. I can't remember all of this. So, for convenience, I usually assume anything can be multiple-headers and multiple-values and do things this way:
>>> return_paths = msg.get_all('Return-Path')
This returns the list
["<bob@example2.com>"]
. (If there are no Return-Path
headers, you'll get an empty list
, instead of None
, this way.) And you can just parse that all at once, to get a list
of name, address pairs instead of just one:
>>> from email.utils import getaddresses
>>> for realname, emailaddr in getaddresses(returnpaths):
... print(realname, emailaddr)
And if it turns out that Return-Path only allows a single value, the same code just works as-is.