Question

I have "unit separator" 0x1F stored in database.

All I wanted is to use MSXML6.dll to export the unit separator to XML 1.0 format.

Here are the pains I've got:

  1. Write 0x1F into XML file directly, error message, the attribute ended up with empty string.

  2. Replace with HTML Entity "& # x 1 F;", then write into XML file, it turned out to be: "& a m p ; # x 1 F;", which is disappointing.

  3. If I manually change XML file to replace "& a m p ; # x 1 F ;" to "& # x 1 F ;", the XML parser fail with exception "Invalid Unicode Character".

The Question: So, if I can not use XML 1.1, what's the best solution to write "unit separator" into XML file and importable?

Note: One possible solution is to replace "unit separator" with some STRANGE string like "$". But is this a good name at all? What's your opinion if I use "0x1F" or "#x1F" or "#x1F;" instead of "&#x1F"? Which is better or any better candidates?


Summary:

Let's make an analogy: Let's think about how the compiler works, there are two phases: "Pre-compile" and "Compile".

For XML File Generation, it acts like the "Compile" phase. E.g. convert "<" to "& l t ;"

However, the Unit Separator is not supported by XML 1.0, so the "Compile" phase will not convert it to HTML Entity "& # x 1 F ;"

So we have to seek solution in the "Pre-Compile" phase, which is our own application's responsibility.

When writing:

Option1: <unit>aaa</unit><unit>bbb</unit>
Option2: simply use "_x241F_" to replace "\37" in the string if "_x241F_" is not conflicting with any existing token in the string.

When reading:

According to Option1: Load the elements, catenate to a single string with "\37" as separator.
According to Option2: simply use "\37" to replace "_x241F_".

I've also found out that MSXML (even the highest version MSXML6.dll) will not load XML 1.1 .

So if we are unfortunately using MSXML, we have to write our own "Pre-Compile" code to handle the Unicode characters before feeding the "Compile" phase.

Note: I borrowed the idea of "_ x 2 4 1F _" from here. Thanks for everyone's help

Was it helpful?

Solution

Maybe appending an internal DTD to the XML file might work for you?

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [<!ENTITY 0x1F "&#x1F;">]>
<root>
  <Units>Unit1&0x1F;Unit2</Units>
</root>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top