Question

I am working in many xml files. And I want to replace some particular content only in a specific region of all files. For example:

the files may have many of the following contents:

<h2>Content comes here</h2>

Now I want to replace a word only in the above <h2>...</h2> region in all files.

Please advice. Thanks in advance.

Was it helpful?

Solution

General text replacement in Perl is usually done using regexes and the s/// operator. However it is considered very unadvisable to try to interpret the structure of an XML file using only regexes.

You should use a module which parses XML. XML::Simple will allow you to load the whole document as a Perl object (using hashrefs for attributes and subtags, etc.) and you can then traverse it and do the replacement you want to. However you then have to write that structure back as you choose.

XML::Parser is a good bet in my opinion. It is conceptually a bit more tricky, but is designed to do exactly the sort of thing you want. You set up handler functions which get called every time the parser finds the start or end of a tag. In your case all these have to do is output the tag and its contents, except when it's a h2 tag, in which case you do some extra processing.

There are also some DOM-oriented parsers which you might want to use if you are used to doing stuff like this in JavaScript or some other DOM-based XML library.

Last and for the sake of completeness, you can probably write a (very short) XSLT file which will do this transformation (not an expert, so not sure exactly how) and apply it using XML::XSLT, basically in one line.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top