Question

I have been developing Java programs that parse html source code of webpages by using various html parsers like Jericho, NekoHtml etc...

Now I want to develop parsers in PHP language. So before starting, I want to know that are there any html parsers available that I can use with PHP to parse html code

Was it helpful?

Solution

Check out DOMDocument.

Example #1 Creating a Document

<?php
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Test<br></body></html>");
echo $doc->saveHTML();

OTHER TIPS

The builtin class DOM parser does a very good job. There are many other xml parsers, too.

DOM is pretty good for this. It can also deal with invalid markup, however, it will throw undocumented errors and exceptions in cases of imperfect markup so I suggest you filter HTML with HTMLPurifier or some other library before loading it with the DOM.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top