سؤال

I'm trying to parse a mailheader, but I can't seem to find how to get the part from the header which have a newline after a semicolon.

Example:

Content-Type: multipart/related;
    type="multipart/alternative";
    boundary="----_=_NextPart_002_01CF36FC.6259F03C"

I'm parsing this with the following regex in preg_match_all:

/(?P<keyname>.*):(\s*)?(?<value>(?:(?!;).)+)((\s*)?;([\s\\r\\n\\t]*)?(?<sub_value>.*)))?/i

But this isn't giving me the boundary line inside the sub_value. I tried also with

(?<sub_value>(.+|;[\s\\r\\n\\t]*))

instead of

(?<sub_value>.*) 

but it doesn't change anything.

.+|;[\s\\r\\n\\t]*<br>

as in all characters, or a semicolon with a newline/tab after it

Thanks in advance!

Edit: When I'm using

(?<sub_value>([\w_.=\"\/\-;\s\\r\\n\\t]*))

I'm getting the boundary part too but it's getting more than it should be, and are all the characters included that can be present in a mailboundary? Also, it catches the newline without having a semicolon.

هل كانت مفيدة؟

المحلول

Try this one:

$headers = <<<EOT
Host: www.example.com
Content-Length: 9000
Content-Type: multipart/related;
    type="multipart/alternative";
    boundary="----_=_NextPart_002_01CF36FC.6259F03C"
X-Http: ok
EOT;

preg_match_all("/(?P<keyname>[a-zA-Z0-9-]+):(?P<value>.*?)[\n\r;]+(?P<sub_value>[\s\S]*?)(?=$|[a-zA-Z0-9-]+:)/", $headers, $match);

You can get the explanation of regex from this link. Just remove the (?:P<...>) from the regex when you try to get the explanation using the link.

نصائح أخرى

According to RFC1341 RFC1521 I believe that this regex contains the possible boundary characters:

$regex = "/"
       . "(?P<keyname>.*)"
       . ":(\s*)?"
       . "(?<value>(?:(?!;).)+)"
       . "("
       . "(\s*)?;"
       . "([\s\\r\\n\\t]*)?"
       . "(?<sub_value>(['()+_,\-.:;?=\"\/\w\s\\r\\n\\t]*))"
       . ")?"
       . "/i";

EDIT updated RFC link and regex

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top