Question

I'm using PHPs DOM to build a HTML document.

At the end of the document, I create a script element.

If the script has any entites, specifically, < and >, then these are converted to &lt; and &gt;

This is obviously a problem if I have any strings containing those characters (or in my case regexs)

Is there a non hackish way (ie NOT string replacement) to prevent this behaviour in the script tags ONLY?

Was it helpful?

Solution

This normally is not a problem. Those characters are only encoded as &lt; or &gt; if you use DOMDocument::saveXML(). If you use DOMDocument::saveHTML() those are just < and > in a <script> tag.

Example:

<?php
/**
 * PHP DOM and JavaScript with HTML entities
 *
 * @link http://stackoverflow.com/q/18487515/367456
 */

$doc = new DOMDocument("1.0");
$doc->loadXML('<head/>');

$javascriptCode = "\n  if (1 < 4) {\n    alert(\"hello\");\n  }\n";

$script = $doc->createElement('script');
$script->appendChild($doc->createCDATASection($javascriptCode));

$head         = $doc->getElementsByTagName('head')->item(0);
$scriptInHead = $head->appendChild($script);

echo 'libxml: ', LIBXML_DOTTED_VERSION, "\n"
    , "\nXML:\n", $doc->saveXML()
    , "\nHTML:\n", $doc->saveHTML()
;

Program Output (Demo (Multi-Version)):

libxml: 2.7.8

XML:
<?xml version="1.0"?>
<head><script><![CDATA[
  if (1 < 4) {
    alert("hello");
  }
]]></script></head>

HTML:
<head><script>
  if (1 < 4) {
    alert("hello");
  }
</script></head>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top