Question

I need to process html submitted in my web application and don't want to munge the whole thing with regular expressions. What tokenizer approach and/or software should I take?

Was it helpful?

Solution

I would use the DOMDocument::loadHTML method to load the HTML document. And if you want a simpler handling than the DOMDocument methods, you can convert it to a SimpleXML object by using simplexml_import_dom().

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top