To convert the entire HTML snippet to TcPDF as you mentioned in your comment, you'll need to parse the snippet with DOMDocument
and loop through each child node deciding how to handle them appropriately.
The catch with the snippet you've provided above is that it isn't a complete HTML document, thus DOMDocument
will wrap it in <html>
and <body>
tags when parsing it, loading the following structure internally:
<html>
<body>
Some text
<p>A paragraph</p>
<img src="image1.jpg" width="200" height="200">
More text
<img src="image2.jpg" width="200" height="200">
</body>
</html>
This caveat is easily worked around, however, by building on @hakre's answer in the thread I linked to below. My recommendation would be something along the lines of the following:
// Load the snipped into a DOMDocument
$doc = new DOMDocument();
$doc->loadHTML($content);
// Use DOMXPath to retrieve the body content of the snippet
$xpath = new DOMXPath($doc);
$data = $xpath->evaluate('//html/body');
// <body> is now $data[0], so for readability we do this
$body = $data[0];
// Now we loop through the elements in your original snippet
foreach ($body->childNodes as $node) {
switch ($node->nodeName) {
case 'img':
// Get the value of the src attribute from the img element
$src = $node->attributes->getNamedItem('src')->nodeValue;
$this->Image($src, PDF_MARGIN_LEFT, $y_offset, 116, 85);
break;
default:
// Pass the line to TcPDF as a normal paragraph
break;
}
}
This way, you can easily add additional case 'blah':
blocks to handle other elements which may appear in your $content
snippets and handle them appropriately, and the content will be processed in the correct order without breaking the original flow of the text. :)
-- Original answer. Will work if you just want to extract the image sources and process them elsewhere independently of the rest of the content.
You can match all the <img>
tags in your $content
string by using a regular expression:
/<img(?:[\s\w="]+)src="([^"]+)"(?:[\s\w="]*)\/?>/i
A live breakdown of the regex which you can play with to see how it works is here: http://regex101.com/r/tS5xY9
You can use this regex with preg_match_all()
to retrieve all of the image tags from within your $content
variable as follows:
$matches = array();
$num = preg_match_all('/<img(?:[\s\w="]+)src="([^"]+)"(?:[\s\w="]*)\/?>/i', $content, $matches, PREG_SET_ORDER);
The PREG_SET_ORDER
constant tells preg_match_all()
to store its results in a manner which is more easily looped through when producing output, as the first index on the array (i.e., $matches[0]
, $matches[1]
, etc) will contain the complete set of matches from the regular expression. In the case of the regex above, $matches[0]
will contain the following:
array(
0 => '<img src="image1.jpg" width="200" height="200">',
1 => 'image1.jpg',
)
You can now loop through $matches
as $key => $match
and pass $match[1]
to your $this->Image()
method.
Alternatively, if you don't want to loop through, you can just access each src
attribute directly from $matches
as $matches[0][1]
, $matches[1][1]
, etc.
If you need to be able to access the other attributes within the tags, then I recommend using the DOMDocument
method provided by @hakre on Get img src with PHP. If you just need to access the src
attribute, then using preg_match_all()
is faster and more efficient as it does not need to load the entire DOM of the snippet into memory as objects to provide you with the data you need.