Howto encode texts outside the <pre></pre> tag with htmlentities()? (PHP)
Question
I'm trying to make my own BBCode parser for my website and I'm looking for a way to "htmlentities()" except the codes inside PRE tags, and the PRE tag itself.
For example:
<b>Hello world</b> (outputs <b>Hello world<>) <pre>"This must not be converted to HTML entities"</pre> (outputs <pre>"This must not be converted to HTML entities"</pre>)
I really got no idea on how to do this.
Any kind of help would be appreciated :)
Thanks.
Solution
You could convert the <pre> … </pre>
back to <pre> … </pre>
:
// convert anything
$str = htmlspecialchars($str);
// convert <pre> back
$str = preg_replace('/<pre>((?:[^&]+|&(?!lt;\\/pre>))*)<\\/pre>/s', '<pre>$1</pre>', $str);
OTHER TIPS
If it's to practice, ok. But if it's just to get the feature, then don't reinvent the wheel. Parsing is not an easy task, and there are plenty of mature parsers out there. Of course, I would look at the PEAR packages first. Try HTML_BBCodeParser.
If you really want to do it yourself, you got two ways :
- regexp
- state machines
Usually a mix of both is handy. But because tags can be nested and badly formed, it's really a hard stuff to code. At least, use a generic parser code and define you lexical fields, from scratch it will take all the time you use to code the web site.
Btw : using a BBparser does not free you from sanitizing the user input...
EDIT : I'm in a good mood today, so here is a snippet on how to use HTML_BBCodeParser :
// if you don't know how to use pear, you'd better learn that quick
// set the path so pear is in it
ini_set("include_path", ini_get("include_path").":/usr/share/pear");
// include PEAR and the parser
require_once("PEAR.php");
require_once("HTML/BBCodeParser.php");
// you can tweak settings from a ini fil
$config = parse_ini_file("BBCodeParser.ini", true);
$options = &PEAR::getStaticProperty("HTML_BBCodeParser", "_options");
$options = $config["HTML_BBCodeParser"];
// here start the parsing
$parser = new HTML_BBCodeParser();
$parser->setText($the_mighty_BBCode);
$parser->parse();
$parsed = $parser->getParsed();
// don't forget to clean that
echo htmlspecialchars(striptags($parsed));