Question

I have transformed some xml into an other xml using xsltproc.

xsltproc iso8859_1.xslt iso8859.xml 

And the accents do not appears correctly (my $LANG in my Linux is en_US.ISO-8895-1).

if I use

xsltproc iso8859_1.xslt iso8859.xml \
| iconv --from-code=utf-8 --to-code=iso-8859-1

And the accent appears correctly. (also in my resulting HTML document)

How can I make the accents appears correctly without calling a new pipe after xsltproc command?

Was it helpful?

Solution

If you want to write the output XML in a specific encoding you need to specify this on the xsl:output instruction

<xsl:output method="xml" encoding="ISO-8859-1" />

The big benefit of configuring the encoding this way rather than fixing it up later with iconv is that it means the XML serializer knows what the target encoding will be, and if your stylesheet tries to output any characters that are not representable in the selected character encoding then they will be preserved as character references rather than disappearing at iconv time, for example

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml" encoding="ISO-8859-1" />

  <xsl:template match="/">
    <example>אבג</example>
  </xsl:template>
</xsl:stylesheet>

run over any XML document will produce

<?xml version="1.0" encoding="ISO-8859-1"?>
<example>&#1488;&#1489;&#1490;</example>

the three character references representing א‎, ב‎ and ג respectively (remember Hebrew reads from right to left).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top