Question

I have scraped some javascript (using simple_html_dom) and this is what I've come up with...

Contents of $MyScrape

<script type="text/javascript">
var initialInfo = [
    [
        [29, 30, 'bb1', 'bb2', '02/15/2013 20:00:00', '02/15/2013 00:00:00', 6, 'AT', '1 : 1', '2 : 3', , , '2 : 3'],
        [
            [29, 'bb1', 6.91, [
                    [
                        ['pears', [4]],
                        ['kiwis', [20]]
                    ]
                ],
                [
                    [36849, 'abcdefg', 6.24, [
                        [
                            ['apples', [3]],
                            ['oranges', [0]]
                        ]
                    ], 5, 'iff', 29, 2, 88, 'abc', 23, 180, 76]
                ],
                ['4231', [
                    [5, 1],
                    [7, 7]
                ]]
            ]
        ]
    ], 0
];
</script>

I am trying to get the contents of initialInfo to a PHP variable so I can do this....

$str = ????;
$jsonarray = json_decode($str, true);

foreach($jsonarray as $row)
{
    $id = $row[0][0]; //29
    $tc = $row[0][1]; //30
    $ab = $row[0][2]; //bb1
}

Anyone got an idea how I can do this (preferrably simply)?

Was it helpful?

Solution 4

Ok, here's what I did to get this working....

//Cut out the non-json stuff
  $start = strpos($MyScrape,'initialInfo = ')+14;
  $end = strpos($MyScrape,'</script>');
  $data = substr($MatchDetails, $start, ($end-$start));

//Convert the new string to JSON (as it's not quite right)

//Made single quotes into double so that JSON can read it.
  $fixedJSON = str_replace("'", '"', $data);
//change double commas with blank data inside so JSON can read it.
  $fixedCommas = str_replace(",,,", ", 0, 0,", $fixedJSON);
//remove the ending semicolon as JSON can't read it.
  $removedSemiColon = str_replace(";", "", $fixedCommas);

$jsonarray = json_decode($removedSemiColon);

//Now I can actually get stuff out of it...
  echo $row[0][0]; //29
  echo $row[0][1]; //30
  echo $row[0][2]; //bb1

OTHER TIPS

To treat it as JSON, you have to fix a few things:

  • JSON uses double quotes, not single quotes.
  • The two consecutive commas in ..., '1 : 1', '2 : 3', , , '2 : 3'], aren't valid JSON.
  • You have to trim off the variable declaration (var initialInfo =).
  • You have to trim off that ending semicolon.

You could also write your own parser, as this code uses only array literals, strings, and numbers.

This might get you the string:

function js_array($str, $array_name)
{
    $pattern = "/$array_name ?\[[\s\S]*?\] ?\= ?[\'\"]([\s\S.]*?)[\'\"];/";

    preg_match_all($pattern, $str, $matches);

    $array = (isset($matches[1])) ? $matches[1] : array();

    return $str;
}
$str = js_array($MyScrape, 'initialInfo');

Then I might try json_decode, as you mentioned.

Let me know if it works (or not)!

It may not work for older browsers, but you could use JSON.stringify(initialinfo), and store it into a hidden field, then extract it with PHP.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top