Question

I'm trying to create an XML document from strings that contain some special characters. The .NET XML classes seem to be escaping the characters incorrectly. The generated XML isn't accepted by any parsers that I've tried.

XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("root");
doc.AppendChild(root);

XmlElement elem = doc.CreateElement("elem1");
root.AppendChild(elem);

byte[] bytes = new byte[4];
bytes[0] = 0;
bytes[1] = 0;
bytes[2] = (byte)0x80;
bytes[3] = 0x1c;
String val = Encoding.ASCII.GetString(bytes);

elem.AppendChild(doc.CreateTextNode(val));

// Here root.OuterXml is:
<root><elem1>&#x0;&#x0;?&#x1C;</elem1></root>
Was it helpful?

Solution

The reason why you are getting invalid XML is because you are attempting to store characters which are not valid for an XML file.

According to the XML specification here's the list of valid characters which define a so called well-formed XML (which is what .NET parsers produce):

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

In your example case you are attempting to store 0x0 and 0x1C which are outside of the allowed range of characters. The only valid character from your example is 0x80.

So you cannot expect to get valid XML from any .NET parser if you are feeding it with invalid bytes that do not respect the XML standard.

In order to fix your get make sure you have gotten rid of any invalid characters that fall outside of the range defined by the XML specification:

byte[] bytes = new byte[1];
bytes[0] = 0x80;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top