Question

I'm having a problem with HTML Purifier where it removes IDs on headline elements despite using configuration options to avoid such behavior.

Right now I'm using:

// set up HTML Purifier for user inputs
require_once 'htmlpurifier/library/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'HTML 4.01 Transitional');
$config->set('Attr.EnableID', true);
$config->set('HTML.Trusted', true);

$purifier = new HTMLPurifier($config);

I then feed it a string like:

<h6 id="1843804297">This is a title</h6><h5 id="1979691494">This one too.</h5><h3 id="932393874">I think you see where this is going.</h3>

I have also tried creating whitelisted entries for headlines with IDs to no avail, and even directly manipulating the defaults stored in the $config object.

$config->def->defaults['Attr.EnableID'] = true;

The IDs are important because they are assigned by a PHP script, stored in MySQL, and later picked up by a JS navigation system. They need to be fed in from the user, because often they stay static for subsequent content updates.

Was it helpful?

Solution

I believe that's because numeric IDs are invalid in HTML4.

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

Try using different IDs or change the Doctype.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top