Question

I am creating an XML 1.0 application that describes files (like others have done). At present I have a file element that requires a name attribute; the value of that attribute is the name of the file.

But I beleive this will not work. File-names that contain special characters &, <, ' and " are tricky, but you can use the predefined entity references for those. But what about file-names that contain control characters? Although very rare, these are possible.

It seems to me there is no way to create an XML application for my purpose, because XML (1.0) does not permit the control characters anywhere in the text. Quoth the standard:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Are there any tricks to get around this? Will it work in XML 1.1, or does that too have limitations?


On my GNU/Linux computer, I can do this to create two files with control characters in their names:

 touch `echo -e 'SP\a'`
 touch `echo -e 'SP\v'`
 ls SP*
Was it helpful?

Solution

In XML 1.1 you can represent all characters except NUL (codepoint 0). Control characters must be escaped as numeric character references.

If you need all characters including NUL, you will need to define your own escape convention. You could adopt the convention used for URIs (%HH) or the convention used in Java (\uNNNN), or you could invent your own.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top