Maybe appending an internal DTD to the XML file might work for you?
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [<!ENTITY 0x1F "">]>
<root>
<Units>Unit1&0x1F;Unit2</Units>
</root>
سؤال
I have "unit separator" 0x1F stored in database.
All I wanted is to use MSXML6.dll to export the unit separator to XML 1.0 format.
Here are the pains I've got:
Write 0x1F into XML file directly, error message, the attribute ended up with empty string.
Replace with HTML Entity "& # x 1 F;", then write into XML file, it turned out to be: "& a m p ; # x 1 F;", which is disappointing.
If I manually change XML file to replace "& a m p ; # x 1 F ;" to "& # x 1 F ;", the XML parser fail with exception "Invalid Unicode Character".
The Question: So, if I can not use XML 1.1, what's the best solution to write "unit separator" into XML file and importable?
Note: One possible solution is to replace "unit separator" with some STRANGE string like "$". But is this a good name at all? What's your opinion if I use "0x1F" or "#x1F" or "#x1F;" instead of ""? Which is better or any better candidates?
Summary:
Let's make an analogy: Let's think about how the compiler works, there are two phases: "Pre-compile" and "Compile".
For XML File Generation, it acts like the "Compile" phase. E.g. convert "<" to "& l t ;"
However, the Unit Separator is not supported by XML 1.0, so the "Compile" phase will not convert it to HTML Entity "& # x 1 F ;"
So we have to seek solution in the "Pre-Compile" phase, which is our own application's responsibility.
When writing:
Option1: <unit>aaa</unit><unit>bbb</unit>
Option2: simply use "_x241F_" to replace "\37" in the string if "_x241F_" is not conflicting with any existing token in the string.
When reading:
According to Option1: Load the elements, catenate to a single string with "\37" as separator.
According to Option2: simply use "\37" to replace "_x241F_".
I've also found out that MSXML (even the highest version MSXML6.dll) will not load XML 1.1 .
So if we are unfortunately using MSXML, we have to write our own "Pre-Compile" code to handle the Unicode characters before feeding the "Compile" phase.
Note: I borrowed the idea of "_ x 2 4 1F _" from here. Thanks for everyone's help
المحلول
Maybe appending an internal DTD to the XML file might work for you?
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [<!ENTITY 0x1F "">]>
<root>
<Units>Unit1&0x1F;Unit2</Units>
</root>