I'm obtaining emails from IMAP using Python and imaplib. In this specific case I'm having a problem with the To: addresses.
I extract the encoded To: field, separate the addresses and then try to decode each. I have a problem with this specific string. I'm using the Python decode_header function to decode the Quoted-Printable/Base64 encoding. I start with the encoded:
'=?utf-8?b?vmfzy28gugf0csoty2lv?= <vasco.patricio.pessoal@gmail.com>'
It's supposed to be Vasco Patrício <vasco.patricio.pessoal@gmail.com>
(my name and email). As expected, decode_header returns a set of encoded substrings and their encodings, which results in this array of 2 tuples:
[('\xbeg\xf3\xcbo \xba\x07\xf4r\xca-\xcbio', 'utf-8'), ('<vasco.patricio.pessoal@gmail.com>', None)]
However, when I try to decode the first tuple using this very simple code:
for part in decoded_parts:
if part[1]:
part_text = part[0].decode(part[1])
else:
part_text = part[0]
I obtain a UnicodeDecodeError:
UnicodeDecodeError at /api/refresh/emails/
'utf8' codec can't decode byte 0xbe in position 0: invalid start byte
I confirm that trying to decode it via the console results in the same exception.
Isn't decode_header supposed to return valid de-codable strings together with their encodings?
Thank you