Python and imaplib: Obtain attachment names or body without downloading full email

Question 1

Assuming you're asking what I think you're asking, here's what to do:

First, fetch the BODYSTRUCTURE. Assuming gmail's IMAP server supports this, you'll get back something like this:

(("TEXT" "PLAIN" ("CHARSET" "UTF-8") NIL NIL "QUOTED-PRINTABLE" 56 1 NIL NIL NIL NIL)
 ("TEXT" "HTML" ("CHARSET" "UTF-8") (NAME "") NIL NIL "BASE64" 12345 NIL 
  ("attachment" ("FILENAME" "")) NIL NIL) 
 ("IMG" "JPEG" (NAME "funny picture") NIL NIL "BASE64" 56789 NIL
  ("attachment" ("FILENAME" "image.jpg")) NIL NIL))
 "MIXED" ("BOUNDARY" "----_=_NextPart_001_1234ABCD.56789EF0") NIL NIL NIL)

And then fetch the (BODY ENVELOPE) is the structure has one.

If you look at RFC3501 7.4.2, it explains how to deal with these.

Once you've determined that the (BODY[1]) and (BODY[2]) are the plain-text and HTML versions of the main content, and (BODY[3]) is the first real attachment, you download the plain-text body by fetching (BODY[1]), and you've got the name of the attachment from the structure.

Sorry there's no code here. I don't think either imaplib or any of the stdlib MIME- and mail-related modules will do the hard part for you (interpreting the structure), but I haven't actually checked, so I'd look there first, and, if not, go to PyPI to see if anyone else has already written the code.

Well, actually, first I'd just fetch BODYSTRUCTURE, (BODY ENVELOPE) and (BODY[3]) for a specific message to make sure gmail has complete support before writing a whole mess of code…

PS, if worst comes to worst, if your use case is as simple and rigid as you described, you can just always fetch BODYSTRUCTURE and (BODY[1]), fall back to RFC822 if that fails, and get the attachment names by running a hacky regexp on the structure instead of a real parse. I wouldn't write this for anything but a one-shot script or a quick&dirty prototype to learn about gmail, but for those cases, I probably would.

Question 2

[Edit]

Ok here we go =)

>>> import imaplib, email
>>> mail = imaplib.IMAP4_SSL('imap.gmail.com')
>>> mail.login('emailaddr@gmail.com', 'password')
('OK', ['emailaddr@gmail.com Inget Namn authenticated (Success)'])
>>> mail.select('inbox')
('OK', ['14'])
>>> result, data = mail.uid('search', None, 'ALL')
>>> uids=data[0].split()
>>> result, data = mail.uid('fetch', uids[-1], 'BODYSTRUCTURE')
>>> print data
['14 (UID 340 BODYSTRUCTURE ((("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 17 1 NIL NIL NIL)("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 17 1 NIL NIL NIL) "ALTERNATIVE" ("BOUNDARY" "20cf3071d16a5a877b04d0adcc43") NIL NIL)("APPLICATION" "PDF" ("NAME" "attiny40.pdf") NIL NIL "BASE64" 8429956 NIL ("ATTACHMENT" ("FILENAME" "attiny40.pdf")) NIL) "MIXED" ("BOUNDARY" "20cf3071d16a5a878104d0adcc45") NIL NIL))']
>>>

The attachement for this message is called "attiny40.pdf" and you can clearly see that name in the BODYSTRUCTURE. All that is left is parsing that BODYSTRUCTURE.

The code is pretty much taken straight from the last link below.

[/Edit]

You will need to change the parameter for fetch from RFC822 to BODYSTRUCTURE.

And then as described here for example.

For example, a two part message consisting of a text and a BASE64-encoded text attachment can have a body structure of: (("TEXT" "PLAIN" ("CHARSET" "US-ASCII") NIL NIL "7BIT" 1152 23)("TEXT" "PLAIN" ("CHARSET" "US-ASCII" "NAME" "cc.diff") "960723163407.20117h@cac.washington.edu" "Compiler diff" "BASE64" 4554 73) "MIXED")

See also this post and this one. The last link looks like pretty much as what you are trying to do.