Question

Now that I can navigate a Web page via WWW::Mechanize and get information via HTML::TreeBuilder::XPath by accessing an id, I am left using Firebug to read the DOM in order to discover the layout of the HTML tree. The content that Mechanize captures is unstructured HTML, not good for human eyes.

Is using Firebug to ascertain the id I am after a typical approach? Once I get the id then I'm good to go, it's just that I've got several ids and pages with more ids to chase down and I was hoping to get (dump, print, etc.) a formatted layout of the DOM in order to make that discovery easier. Though granted, Firebug makes it pretty easy, too. I'm just wondering if I am missing an easier method.

Crossposted at PerlMonks.

Was it helpful?

Solution

If you need text, xmllint --html --format (comes with libxml2) does a decent job.

If you want a tree and mess with it and test out various expressions in a GUI, then Xacobeo is your new best friend.

Xacobeo screenshot

Note: since both those tools rely on libxml, replace HTML::TreeBuilder::XPath with HTML::TreeBuilder::LibXML for compatibility. Evaluating XPath will be faster that way, too.


If you know Javascript/JQuery, then also install FireQuery. You can then test out CSS expressions in FireBug, and use them with modules that select HTML through CSS expressions, e.g. Web::Query.

FireQuery screenshot

OTHER TIPS

I use XML Developer from Oxygen IDE for my recent development on XPath: http://www.oxygenxml.com/download.html It is a 30-day trial type of tool, but you can also search for XPath visualizer

It doesn't visualize a tree for you as far as I know (maybe there's a panel doing that). But it gives you some smart complete functionally that helps you to know what nodes you have available at any point. It is pretty big for XPath because it is hard to know where the parser pivot is really pointing at.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top