Question

I'm using XML::Twig::XPath to work with ITS data, and am trying to figure out how to resolve XPath expressions with variables in them. Here's an example of what I need to work with from the ITS spec:

<its:rules version="2.0">
  <its:param name="LCID">0x0409</its:param>
  <its:translateRule selector="//msg[@lcid=$LCID]" translate="yes"/>
</its:rules>

I need to be able to evaluate the XPath expression contained in selector, with the value of the variable being the contents of the its:param element. I am at a loss as to how to do this. The documentation of XML::XPath mentions variables (which I assume should be part of the context), and it even has a class to represent them, but the documentation doesn't say how to specify variables in a context. I would be even more unsure of how to access such functionality from XML::Twig, if at all possible.

Does anyone know how to do this? Or alternatively, can you give an example of how to use such functionality with another module such as XML::LibXML (which mentions variables extensively, but leaves me a little unsure as to how to do this with variables that are strings)?

Was it helpful?

Solution

libxml2 and XML::LibXML supports XPath 2.0 paths and their variables.

use XML::LibXML               qw( );
use XML::LibXML::XPathContext qw( );

sub dict_lookup {
   my ($dict, $var_name, $ns) = @_;
   $var_name = "{$ns}$var_name" if defined($ns);
   my $val = $dict->{$var_name};
   if (!defined($val)) {
      warn("Unknown variable \"$var_name\"\n");
      $val = '';
   }

   return $val;
}

my $xml = <<'__EOI__';
<r>
<e x="a">A</e>
<e x="b">B</e>
</r>
__EOI__

my %dict = ( x => 'b' );

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);

my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerVarLookupFunc(\&dict_lookup, \%dict);

say $_->textContent() for $xpc->findnodes('//e[@x=$x]', $doc);

OTHER TIPS

Here is a complete solution.

I sidestepped the "what's a Qname" part by building a regexp from the parameter names already found. this might be slow if there are many parameters, but it works fine on the W3C's example; building the regexp means escaping each name between \Q/\E so meta-characters in the names are ignored, sorting the names by length so a shorter name doesn't match instead of a longer one, then joining them by '|',

Limitations:

  • there is no error handling if you use a parameter that's not defined previously,
  • namespaces in selectors are not handled, which is easy to add if you have real data, just add the appropriate map_xmlns declarations,
  • the whole document is loaded in memory, which is hard to avoid if you want to use generic XPath selectors

Here it is:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig::XPath;

my %param;
my $mparam;
my @selectors;

my $t= XML::Twig::XPath->new( 
  map_xmlns     => { 'http://www.w3.org/2005/11/its' => 'its' },
  twig_handlers => { 'its:param' => sub { $param{$_->att( 'name')}= $_->text; 
                                          $match_param= join '|', 
                                                         map { "\Q$_\E" }
                                                         sort { lenght($b) <=> length($a) } keys %param;
                                        },
                     'its:translateRule[@translate="yes"]' =>
                                   sub { my $selector= $_->att( 'selector');
                                         $selector=~ s{\$($mparam)}{quote($param{$1})}eg;
                                         push @selectors, $selector;
                                       },
                   },
                            )
                       ->parse( \*DATA);

foreach my $selector (@selectors)
  { my @matches= $t->findnodes( $selector);
    print "$selector: ";
    foreach my $match (@matches) { $match->print; print "\n"; }
  }

sub quote
  { my( $param)= @_;
    return $param=~ m{"} ? qq{'$param'} : qq{"$param"}; 
  }

If you were using an engine that only supports XPath 1.0 paths, you could treat the value as a template whose grammar is:

start : parts EOI
parts : part parts |
part  : string_literal | variable | other

The following produces the XPath from the XPath template.

sub text_to_xpath_lit {
   my ($s) = @_;
   return qq{"$s"} if $s !~ /"/;
   return qq{'$s'} if $s !~ /'/;

   $s =~ s/"/", '"', "/g;
   return qq{concat("$s")};
}

my $NCNameStartChar_class = '_A-Za-z\xC0-\xD6\xD8-\xF6\xF8-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}';
my $NCNameChar_class = $NCNameStartChar_class . '\-.0-9\xB7\x{300}-\x{36F}\x{203F}-\x{2040}';
my $NCName_pat = "[$NCNameStartChar_class][$NCNameChar_class]*+";

my $xpath = '';
for ($xpath_template) {
   while (1) {
      if (/\G ( [^'"\$]++ ) /xgc) {
         $xpath .= $1;
      }
      elsif (/\G (?=['"]) /xgc) {
         /\G ( ' [^\\']*+ ' | " [^\\"]*+ " ) /sxgc
            or die("Unmatched quote\n");

         $xpath .= $1;
      }
      elsif (/\G \$ /xgc) {
         /\G (?: ( $NCName_pat ) : )?+ ( $NCName_pat ) /xgc
            or die("Unexpected '\$'\n");

         my ($prefix, $var_name) = ($1, $2);
         my $ns = $ns_map{$prefix}
            or die("Undefined prefix '$prefix'\n");

         $xpath .= text_to_xpath_lit(var_lookup($ns, $var_name));
      }
      elsif (/\G \z /xgc) {
         last;
      }
   }    
}

Sample var_lookup:

sub var_lookup {
   my ($ns, $var_name) = @_;
   $var_name = "{$ns}$var_name" if defined($ns);
   my $val = $params{$var_name};
   if (!defined($val)) {
      warn("Unknown variable \"$var_name\"\n");
      $val = '';
   }

   return $val;
}

Untested.

In XML::XPath, you can set variables on the XML::XPath::Parser object. It doesn't seem to be directly accessible via the XML::XPath object; you have to use $xp->{path_parser}, which is undocumented, to get to it. Here's an example with a string variable and also a nodeset variable:

use XML::XPath;
use XML::XPath::Parser;
use XML::XPath::Literal;

my $xp = XML::XPath->new(xml => <<'ENDXML');
<?xml version="1.0"?>
<xml>
    <a>
        <stuff foo="bar">
            junk
        </stuff>
    </a>
</xml>
ENDXML

#set the variable to the literal string 'bar'
$xp->{path_parser}->set_var('foo_att', XML::XPath::Literal->new('bar'));
my $nodeset = $xp->find('//*[@foo=$foo_att]');

foreach my $node ($nodeset->get_nodelist) {
    print "1. FOUND\n\n",
        XML::XPath::XMLParser::as_string($node),
        "\n\n";
}

#set the variable to the nodeset found from the previous query
$xp->{path_parser}->set_var('stuff_el', $nodeset);
$nodeset = $xp->find('/*[$stuff_el]');

foreach my $node ($nodeset->get_nodelist) {
    print "2. FOUND\n\n",
        XML::XPath::XMLParser::as_string($node),
        "\n\n";
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top