Locating a < symbol in HTML that isn't part of a tag

Question 1

Try using the answer from this question:

I tried to add this as as it stands, but StackOverflow requires me to add some description to the answer, or it automatically gets converted into a comment, which can't be accepted as an answer.

Question 2

try reading the string from start char by char if it encounters a < push it in a buffer if > is found without a space then its a tag else if it encounter a < again mark the previous as < and put next in buffer ... and repeat until the end of string

Question 3

While it's no longer maintained, I think the php port of html5lib is probably your best bet for parsing bad markup.

A simple call like this:

require_once 'your-path-path-to-html5lib/Parser.php';
$dom = HTML5_Parser::parse($input);

will take bad markup in $input and return a valid php DOMDocument.

From there you can save it back to a string with $dom->saveHTML() or $dom->saveXML, or extract the bits you want with the DOM API.

Note that this will produce a full HTML document with head and body etc. even if your original data didn't include that.

If you just want to parse an HTML fragment, you can do:

$dom = HTML5_Parser::parseFragment($input);

which will return a DOMNodeList.

Question 4

HTML entities are the best way to do such things <> are the entities used to replace <> in HTML. Even using the <code> tag. You can use these entities and replace them with <> in your HTML Tags. www.w3schools.com/html/html_entities.asp

Locating a < symbol in HTML that isn't part of a tag

Example input:

Required output