Looking for a regex to pull img src information from inside of inline Javascript using PHP [closed]

StackOverflow https://stackoverflow.com/questions/16599896

  •  29-05-2022
  •  | 
  •  

Question

I'm using PHP to scrape a few websites. The image information is contained within a script.

<body>
  <div>something</div>
  <div>Something else</div>
  <script type="text/javascript" language="javascript">
      var imgs = ['<img alt="image1" class="happy-image" src="http://example.com/image1.jpg" title = "Image 1">, <img alt="image2" class="happy-image" src="http://example.com/image2.jpg" title = "Image 2">];

  </script>
</body>

I would like to extract from the this string using PHP the information associated with this image and wouldn't even know where to begin to write the regex to make this happen.

Was it helpful?

Solution

Your safest bet would be to parse the HTML with DOMDocument, extract the script's contents, then parse that as HTML. This will give you access to the images. Like so:

$dom = new DOMDocument();
$dom->loadHTML($your_html_here);
$script = $dom->getElementsByTagName('script')->item(0);
$dom->loadHTML($script->nodeValue);
$imgs = $dom->getElementsByTagName('img');
foreach($imgs as $img) {
    $src = $img->getAttribute("src");
    // do something
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top