Encoding of input file for XSLT 2.0 function unparsed-text()

https://stackoverflow.com/questions/22558144

18-06-2023
|

Question

Let's say I have this file.md encoded in UTF-8 (.md means it's markdown format)

Hello world
This text is encoded in UTF-8.

Then I approach it using function unparsed-text('file.md', 'UTF-8'). That works like a charm.

Problem shows up when (let's say) I use one of my native language (Czech) specific character, for example this file2.md:

Hello world
This character "š" is read like "sh" in english.

Using same encoding parameter in unparsed-text() I get error:

XTDE1200: Failed to read input file file:/C:/file2.md (java.nio.charset.MalformedInputException): Input length = 1

file2.md has same encoding UTF-8 as file.md, czech characters are in this charset, yet XSLT processor doesn't accept it. If I change encoding parameter to windows-1250 ie. unparsed-text('file2.md', 'windows-1250') it works nicely.

So question is, why I get this error? Does it relate to the fact that input file is with extension .md (.txt works). Is there way around it? I really want to be able to use same encoding in my xsl stylesheet as supplied input file has.

Thanks for answers.

Solution

As Martin says, the evidence you have provided suggests that the file is encoded in Windows-1252, and that unparsed-text('file.md', 'utf-8') is therefore right to reject it.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow