Question

I'm trying to do a "preg match all" on the response below to get all the binary data. I've tried just about everything imaginable and for the life of me, can't get anything.

I was hoping it'd be as simple as doing something like this:

preg_match_all("#\n\n(.*)\n--$boundary#",$body,$matches);

But I can't get anything. I've tried other stuff too. \r \n | i s m U - I just can't get it for some reason.

Here is a pseudo response not including the headers:

--boundary
content-type:image/jpeg

<binary data>
--boundary
content-type:image/jpeg

<binary data>
--boundary
content-type:image/jpeg

<binary data>
--boundary

unfortunately the binary data isn't enclosed with < & > it's just raw data with special characters over the course of multiple lines...

also: i think the problem lies within the actual binary data that is being displayed because when i run a preg match all on the info above it works just fine but when i try it on the actual data that has all the binary data crap in it, it doesn't work.

Was it helpful?

Solution

Alternatively, you could parse with explode() this should be much faster, it's not too complex, and it gives you the header info if you want it:

<?php

$body = file_get_contents('output.txt');
$boundary = '__NEXT_PART_gc0p4Jq0M2Yt08jU534c0p__';
$parts = explode("--$boundary", $body);
array_shift($parts); # delete up to the first boundary
array_pop($parts); # delete after the last boundary

$binaries = array();
foreach($parts as $part) {
    list($header, $binary) = explode("\n\n", $part, 2);
    $binaries[] = $binary;
}    

print_r($binaries);

OTHER TIPS

\n is platform dependent. Presumably your data is a http-request or an email? In this case, line breaks will be \r\n, so you need to test for that instead

You're expression seems to work fine for me on the data you provided. I pulled down your output.php, and renamed it output.txt, then ran this script:

<?php

$body = file_get_contents('output.txt');
$boundary = '__NEXT_PART_gc0p4Jq0M2Yt08jU534c0p__';
preg_match_all("#\n\n(.*)\n--$boundary#",$body,$matches);
print_r($matches);

Seems to have worked fine, ie it printed this:

Array
(
    [0] => Array
        (
            [0] => 

    [body] => 
--__NEXT_PART_gc0p4Jq0M2Yt08jU534c0p__
            [1] => 

ÿ( RAW IMAGE DATA CONTINUES OVER MULTIPLE LINES starts with "ÿ" ends with "ÿÙ" )ÿÙ
--__NEXT_PART_gc0p4Jq0M2Yt08jU534c0p__
            [2] => 

ÿ( RAW IMAGE DATA CONTINUES OVER MULTIPLE LINES starts with "ÿ" ends with "ÿÙ" )ÿÙ
--__NEXT_PART_gc0p4Jq0M2Yt08jU534c0p__
            [3] => 

ÿ( RAW IMAGE DATA CONTINUES OVER MULTIPLE LINES starts with "ÿ" ends with "ÿÙ" )ÿÙ
--__NEXT_PART_gc0p4Jq0M2Yt08jU534c0p__
            [4] => 

ÿ( RAW IMAGE DATA CONTINUES OVER MULTIPLE LINES starts with "ÿ" ends with "ÿÙ" )ÿÙ
--__NEXT_PART_gc0p4Jq0M2Yt08jU534c0p__
        )

    [1] => Array
        (
            [0] =>     [body] => 
            [1] => ÿ( RAW IMAGE DATA CONTINUES OVER MULTIPLE LINES starts with "ÿ" ends with "ÿÙ" )ÿÙ
            [2] => ÿ( RAW IMAGE DATA CONTINUES OVER MULTIPLE LINES starts with "ÿ" ends with "ÿÙ" )ÿÙ
            [3] => ÿ( RAW IMAGE DATA CONTINUES OVER MULTIPLE LINES starts with "ÿ" ends with "ÿÙ" )ÿÙ
            [4] => ÿ( RAW IMAGE DATA CONTINUES OVER MULTIPLE LINES starts with "ÿ" ends with "ÿÙ" )ÿÙ
        )

)

Looks like the $matches[1] contains the list of binary data you're after.

I don't have an answer regarding your regular expressions, but did you have a look at Zend_Mime?

Ok, well I'm not all that familiar with PHP regular expressions...

Considering what you are trying to do, the dot-matches-newline s switch should work. Using this regular expression seemed to work on my end:

/<binary data>\r\n(.*?)\r\n--simple boundary/s

The *? should be non-greedy, and so it will gobble only so much as to match the very first --simple boundary text string it sees.

Your line endings may differ from mine (I'm on a Windows machine), so you may have to fire up a hex editor to see exactly what should be matched before and after the <binary data> content.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top