문제

Hey guys, I have an application that saves data using XML. I recently received a bug report from a user who is unable to open her data file. The parser apparently failed when it encountered a bad character.

Fortunately, I have a copy of her data file, so I was able to find the culprit, but I don't understand what it is, or how it got there. (As this is all keyboard input from the user.) The offending characters are:

attributeName="Some text then XXX"

Where the "XXX" is, according to a hex editor:

0A 0A 00

The 0A is a line feed as far as I can tell, but how the heck would a null character end up there?! This is the first bug I've seen of this nature.

도움이 되었습니까?

해결책

I think the problem that you are seeing is a little known, but very serious flaw in XML. In a nutshell: xml values cannot contain certain characters, and not only they cannot be in xml text, but they cannot even be escaped using &#DDDD; notation.

Valid XML charset could be found here: http://www.w3.org/TR/REC-xml/#charsets, and it is: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF].

That means that if your string has any character not included into this charset, it cannot be serialized into XML value. The only way to store such string is to serialize it to base64 as binary data.

Many popular frameworks, including MSXML and .NET would allow putting bad data into the XML values, and then would refuse to deserialize such XML. Here's an example, screenshot: http://vvcap.net/db/Db94W-13uwCkNXSZTitO.htp, and source code:

using System;
using System.Xml.Serialization;
using System.Xml;

[Serializable] public class TestClass
{
    [XmlAttribute]
    public string Member { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        var ser = new XmlSerializer(typeof(TestClass));
        var tc = new TestClass() { Member = "zzz \x19 zzz" };
        var stream = new System.IO.StringWriter();
        ser.Serialize(stream, tc);
        var xml = stream.ToString();
        var stream2 = new System.IO.StringReader(stream.ToString());
        var tc2 = ser.Deserialize(stream2);
    }
}
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top