Question

I'm parsing an XML file with LibXML and need to sort the entries by date. Each entry has two date fields, one for when the entry was published and one for when it was updated.

<?xml version="1.0" encoding="utf-8"?>
...
<entry>
  <published>2009-04-10T18:51:04.696+02:00</published>
  <updated>2009-05-30T14:48:27.853+03:00</updated>
  <title>The title</title>
  <content>The content goes here</content>
</entry>
...

The XML file is already ordered by date updated, with the most recent first. I can easily reverse that to put the older entries first:

my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my $xc = XML::LibXML::XPathContext->new($doc->documentElement());

foreach my $entry (reverse($xc->findnodes('//entry'))) {
  ...
}

However, I need to reverse sort the file by date published, not by date updated. How can I do that? The timestamp looks a little wonky too. Would I need to normalize that first?

Thanks!

Update: After fiddling around with XPath namespaces and failing, I made a function that parsed the XML and stored the values I needed in a hash. I then used a bare sort to sort the hash, which works just fine now.

Was it helpful?

Solution

One way would be changing your reverse to a sort statement (untested):

sub parse_date {
    # Transforms date from 2009-04-10T18:51:04.696+02:00 to 20090410
    my $date= shift;
    $date= join "", $date =~ m!\A(\d{4})-(\d{2})-(\d{2}).*!;
    return $date;
}

sub by_published_date {
    my $a_published= parse_date( $a->getChildrenByTagName('published') );
    my $b_published= parse_date( $b->getChildrenByTagName('published') );

    # putting $b_published in front will ensure the descending order.
    return $b_published <=> $a_published;
}

foreach my $entry ( sort by_published_date $xc->findnodes('//entry') ) {
    ...
}

Hope this helps a bit!

OTHER TIPS

A bare sort may put times from different timezones out of order:

 print for sort "2009-06-15T08:00:00+07:00", "2009-06-15T04:00:00+00:00";

Here, the second time is 3 hours after the first, but sorts first.

I'm not sure what you mean by "wonky". Your example just shows timestamps in rfc3339 format.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top