Question

I'm a beginner for Perl and CPAN Modules

I wanna convert a xml file include:

<Item><Link>http://example.com/</Link></Item>....

To

<Item><Link>http://mysite.com/</Link></Item>....

Do you have smart solutions ? with CPAN Module

Was it helpful?

Solution

  • see XML::Twig - A perl module for processing huge XML documents in tree mode.
  • or XML::Simple - Easy API to maintain XML (esp config files)

like,

use strict;
use warnings; 
use XML::Simple;
use Data::Dumper;

my $xml = q~<?xml version='1.0'?>
<root>
  <Item>
  <Link>http://example.com/</Link>
  </Item>
  <Item>
   <Link>http://example1.com/</Link>
  </Item>
</root>~;

print $xml,$/;

my $data = XMLin($xml);

print Dumper( $data );

foreach my $test (@{$data->{Item}}){
   foreach my $key (keys %{$test}){
       $test->{$key} =~ s/example/mysite/;
   }
}
 print XMLout($data, RootName=>'root', NoAttr=>1,XMLDecl => 1);

output:

<?xml version='1.0'?>
<root>
  <Item>
  <Link>http://example.com/</Link>
  </Item>
  <Item>
   <Link>http://example1.com/</Link>
  </Item>
</root>
$VAR1 = {
          'Item' => [
                    {
                      'Link' => 'http://example.com/'
                    },
                    {
                      'Link' => 'http://example1.com/'
                    }
                  ]
        };
<?xml version='1.0' standalone='yes'?>
<root>
  <Item>
    <Link>http://mysite.com/</Link>
  </Item>
  <Item>
    <Link>http://mysite1.com/</Link>
  </Item>
</root>

OTHER TIPS

A simple solution using XML::Twig is below. Compared with the XML::Simple option it works no matter where the Link elements are in the XML, and it will respect the original formatting of the file. It will also work if the XML contains mixed-content.

If you need to change the file in place, you can use parsefile_inplace instead of parsefile, and I suspect the regular expression in subs_text may need to be improved in real life, but this code should be a good starting point.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

XML::Twig->new( twig_roots => { Link => \&replace_link, }, # process Link
                twig_print_outside_roots => 1,             # output everything else
              )
          ->parsefile( 'my.xml');

sub replace_link
  { my( $t, $link)= @_;
    $link->subs_text( qr{^http://example\.com/$}, 'http://mysite.com');
    $t->flush;               # or $link->print, outputs the modified (or not) link
  }           

If all you need is changing a specific value, you don't really need anything special, you can simply use regexp:
from command line :

perl -pi -e 's@http://example.com/@http://mysite.com/@g' file.xml

edit : adding full code version :

my $file = '/tmp/test.xml';

open IN, "<$file" or die "can't open $file $!";
open OUT, ">$file.tmp" or die "can't open $file.tmp $!";
foreach (<IN>) {
    s@http://example.com/@http://mysite.com/@g;
    print OUT $_;
}
close(IN);
close(OUT);

rename("$file.tmp", "$file")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top