Question

i am using strip_tags to strip the tags from xml file and it works just fine when the array size is small but if the page is big it crashes always.here is my script which works for upto 100 values but crashes for bigger value

        preg_match_all("/<image:caption>.*?<\/image:caption>|<image:loc>.*?<\/image:loc>|<loc>.*?<\/loc>/", $str, $results);
         $arr = array_chunk(array_map('strip_tags', $results[0]), 1000);

        for($i=0;$i < 1000;$i++){
      for($j=0;$j < 1000;$j++){


      $output=$arr[$i][$j]. '</br>';


      echo $output;
        }

        }   

it will strip these values nicely but for bigger file it crashes.

      <urlset>

        <url><loc>/1366x768/citroen-ds-cabrio-auto-car-wallshark-com-228615.html</loc><image:image><image:loc>s/1366x768/citroen-ds/228615/citroen-ds-cabrio-auto-car-wallshark-com-228615.jpg</image:loc><image:caption>Citroen Ds Cabrio Auto Car Wallshark Com  Walpapers</image:caption></image:image></url>

          <url><loc>/1366x768/citroen-ds-cars-citro-n-cabrio-213157.html</loc><image:image><image:loc>s/1366x768/citroen-ds/213157/citroen-ds-cars-citro-n-cabrio-213157.jpg</image:loc><image:caption>Citroen Ds Cars Citro N Cabrio  Walpapers</image:caption></image:image></url>

          <url><loc>/1366x768/citroen-ds-citro-n-pictures-95569.html</loc><image:image><image:loc>s/1366x768/citroen-ds/95569/citroen-ds-citro-n-pictures-95569.jpg</image:loc><image:caption>Citroen Ds Citro N Pictures  Walpapers</image:caption></image:image></url>
        </urlset>
Was it helpful?

Solution

You can try this:

<pre><?php

$dom = new DOMDocument();
@$dom->load('Remotefile.xml');

$urls = $dom->getElementsByTagName('url');

foreach ($urls as $url) {
    $image = $url->getElementsByTagName('image')->item(0);
    $imageChildren = $image->childNodes;

    $result[] = array( 'loc' => $url->getElementsByTagName('loc')->item(0)->textContent,
                       'imgloc' => $imageChildren->item(0)->textContent,
                       'imgcap' => $imageChildren->item(1)->textContent);
}

$stmt = $dbh->prepare ("INSERT INTO urls (loc, imageloc, imagecap) VALUES (:loc, :imgloc, :imgcap)");

foreach ($result as $res) {
    $stmt -> bindParam(':loc',    $res['loc']);
    $stmt -> bindParam(':imgloc', $res['imgloc']);
    $stmt -> bindParam(':imgcap', $res['imgcap']);
    $stmt -> execute();
}

A regex way:

$pattern = <<<'LOD'
~
  <url>                                                \s*+
  <loc>           (?<loc>    [^<]++ ) </loc>           \s*+
  <image:image>                                        \s*+
  <image:loc>     (?<imgloc> [^<]++ ) </image:loc>     \s*+
  <image:caption> (?<imgcap> [^<]++ ) </image:caption> \s*+
  </image:image>                                       \s*+
  </url>
~x
LOD;

preg_match_all($pattern, $str, $matches, PREG_SET_ORDER);

/* this foreach part is only for cosmetic and is totally useless */
foreach($matches as &$match) {
    foreach($match as $k=>$m) {
        if (is_numeric($k)) unset($match[$k]);
    }
}
print_r($matches);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top