How do I use XML::LibXML to parse XML using SAX?
-
19-09-2019 - |
Question
The only example code I have found so far is so old it won't work anymore (uses deprecated classes). All I need is something basic that demonstrates:
Loading and parsing the XML from a file
Defining the SAX event handler(s)
Reading the attributes or text values of the element passed to the event handler
Solution
How about the distribution itself?
Go to XML::LibXML distribution page and click browse.
Note the following caution in the documentation:
At the moment XML::LibXML provides only an incomplete interface to libxml2's native SAX implementation. The current implementation is not tested in production environment. It may causes significant memory problems or shows wrong behaviour.
There is also XML::SAX which comes with nice documentation. I used it a few times and worked well for my purposes.
OTHER TIPS
Sinan's suggestion was good, but it didn't connect all the dots. Here is a very simple program that I cobbled together:
file 1: The handlers (MySAXHandler.pm)
package MySAXHandler;
use base qw(XML::SAX::Base);
sub start_document {
my ($self, $doc) = @_;
# process document start event
}
sub start_element {
my ($self, $el) = @_;
# process element start event
print "Element: " . $el->{LocalName} . "\n";
}
1;
file 2: The test program (test.pl)
#!/usr/bin/perl
use strict;
use XML::SAX;
use MySAXHandler;
my $parser = XML::SAX::ParserFactory->parser(
Handler => MySAXHandler->new
);
$parser->parse_uri("some-xml-file.xml");
Note: How to get the values of an element attribute. This was not described in a way that I could use. It took me over an hour to figure out the syntax. Here it is. In my XML file, the attribute was ss:Index. The namespace definition for ss was xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet". Thus, in order to get the silly Index attribute, I needed this:
my $ssIndex = $el->{Attributes}{'{urn:schemas-microsoft-com:office:spreadsheet}Index'}{Value};
That was painful.
XML::LibXML::Sax implements the Perl SAX interface and there is a nice document.