سؤال

I have a file containing an email in "plain text MIME message format". I am not sure if this is the EML format. The email contains an attachment and I want to extract the attachment and create those files again. This is how the attachment part looks like -

...
...
Receive, deliver details
...
...
From: sac ascsac <sacsac@sacascsac.ascsac>

Date: Thu, 20 Jan 2011 18:05:16 +0530

Message-ID: <AANLkTimmSL0iGW4rA3tvSJ9M3eT5yZLTGsqvCvf2fFC3@mail.gmail.com>

Subject: Test attachments

To: ascsacsa@ascsac.com

Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12



--20cf3054ac85d97721049a465e12

Content-Type: multipart/alternative; boundary=20cf3054ac85d97717049a465e10



--20cf3054ac85d97717049a465e10

Content-Type: text/plain; charset=ISO-8859-1



hello this is a test mail. It contains two attachments



--20cf3054ac85d97717049a465e10

Content-Type: text/html; charset=ISO-8859-1



hello this is a test mail. It contains two attachments<br>


--20cf3054ac85d97717049a465e10--

--20cf3054ac85d97721049a465e12

Content-Type: text/plain; charset=US-ASCII; name="simple_test.txt"

Content-Disposition: attachment; filename="simple_test.txt"

Content-Transfer-Encoding: base64

X-Attachment-Id: f_gj5n2yx60



aGVsbG8gd29ybGQKYWMgYXNj
...
encoded things here
...
ZyBmZyAKCjIKNDIzCnQ2Mwo=

--20cf3054ac85d97721049a465e12

Content-Type: application/x-httpd-php; name="oscomm_backup_code.php"

Content-Disposition: attachment; filename="oscomm_backup_code.php"

Content-Transfer-Encoding: base64

X-Attachment-Id: f_gj5n5gxn1



PD9waHAKCg ...
...
encoded things here
...
X2xpbmsoRklMRU5BTUVfQkFDS1VQKSk7Cgo/Pgo=
--20cf3054ac85d97721049a465e12--

I can see that the part between X-Attachment-Id: f_gj5n2yx60 and ZyBmZyAKCjIKNDIzCnQ2Mwo=, both including is the content of the first attachment. I want to parse those attachments (file names and contents and create those files).

I got this file after parsing a dbx format file using a DBX Parser class available in PHP classes.

I searched in many places and did not find much discussion regarding this here in SO other than Script to parse emails for attachments. May be I missed some terms while searching. In that answer it is mentioned -

you can use the boundries to extract the base64 encoded information

But I am not sure which are the boundaries and how exactly to use the boundaries? There already must be some libraries or some well defined method of doing this. I guess I will commit many mistakes if I try reinventing the wheel here.

هل كانت مفيدة؟

المحلول

There's an PHP Mailparse extension, have you tried it?

The manual way would be, process the mail line by line. When you hit your first Content-Type header (this one in your example): Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12

You have the boundary. This string is used as the boundary between your multiple parts (that's why they call it multipart). Everytime a line starts with the dashes and this string, a new part begin. In your example: --20cf3054ac85d97721049a465e12

Every part will start with headers, a blank line, and content. By looking at the content-type of the headers you can determine which are attachments, what their type is and their filename. Read the whole content, strip the spaces, base64_decode it, and you've got the binary contents of the file. Does this help?

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top