Python Email Header parsing get_all()

https://stackoverflow.com/questions/6830611

27-10-2019
|

Question

I'm parsing mailbox files with Python and stumbled upon a strange behvior when trying to get all "To:" headers with get_all():

tos = message.get_all('to', [])
if tos:
    tos = getaddresses(tos)
    for to in tos:
        receiver = EmailInformant()
        receiver_email = to[1]

get_all() gets all "to:" values, which are separated by commas, afaik. getaddresses then splits the single receivers in a name and an email value. For the following "To:" header, it does not work as I would expect:

To: example@test.de <example@test.de>

Here, the email address is provided as name and email value, but the parser treats this as two separate "To:" entries, running the for-loop twice. Is this a bug?

Solution

Parsing emails is very hard, as there are several different specifications, many behaviors that are or were poorly defined, and implementations that don't follow the specifications. Many of them conflict in some ways.

I know the email module in the standard library is currently being rewritten for Python 3.3, see http://www.bitdance.com/blog/. The rewrite should solve problems like this; it is currently available on pypi for Python 3.2 if you have that option: http://pypi.python.org/pypi/email.

Meanwhile, try tos = set(getaddresses(tos)) to eliminate duplicates.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow