Question

I'd like to print an XML document without reducing all of the unicode containing in it to ugly NCRs. Here's a sample:

use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->load_xml(string => '<xml>FULL WIDTH</xml>');
print $doc->toString();

This prints the following:

<?xml version="1.0"?>
<xml>&#xFF26;&#xFF35;&#xFF2C;&#xFF2C; &#xFF37;&#xFF29;&#xFF24;&#xFF34;&#xFF28;</xml>

Very, very ugly and difficult to read (unless viewed in a browser or something).

How can I get the document to print real characters, and to have a utf-8 (or whatever other encoding) declaration?

Was it helpful?

Solution

The object type returned by XML::LibXML::Parser is XML::LibXML::Document, which has a setEncoding method:

$doc->setEncoding('utf-8');

Now the script prints this:

<?xml version="1.0" encoding="utf-8"?>
<xml>FULL WIDTH</xml>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top