سؤال

Anything below 0x20 (except for 0x09, 0x0a, 0x0d i.e. tab, carrige return and line feed) cannot be included in an XML document.

I have some data coming out of a Database and being passed as a response to a Web Service request.

The Soap formatter happily encodes 0x12 character (Ascii 18, Device Control 2) as  but the response fails on the client with hexadecimal value 0x12, is an invalid character

<rant> What I find quite frustrating is these are two sides of the same coin, both client and service are .net apps. Why will the soap formatter write bad xml if nothing can read it?</rant>

I'd like to either

  1. Get the Xml Serialiser to handle these odd characters correctly or
  2. Have the request fail in the Web Service

I've googled and couldn't find much on this other than, a) "sanitise your Inputs" or b) "change your document structure".

a) Isn't a runner as some of this data is +20 years old
b) isn't much of an option either, as other than our own front end, we have clients that code against the Web Service directly.

Is there something obvious I'm missing? Or is it simply a case of code around AscII control codes?

Thanks

Update
This is actually a problem with the XmlSerialiser, the following code will serialise an invalid character to the stream, but will not de-serialise it

[Serializable]
public class MyData 
{
    public string Text { get; set; }

}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData {Text = "hello " 
                + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 }) 
                + " world"};

        var serializer = new XmlSerializer(typeof(MyData));

        var xmlWriter = new StringWriter();

        serializer.Serialize(xmlWriter, myData);

        var xmlReader = new StringReader(xmlWriter.ToString());

        var newData = (MyData)serializer.Deserialize(xmlReader); // Exception 
        // hexadecimal value 0x12, is an invalid character.

    }
}

I can get it to choke writing the xml by explicitly creating an XmlWriter and passing that to Serialise (I'll post that shortly as my own answer), but that still means I've to sanatize my data before sending it.
As these characters are significant I can't just strip them, I need to encode them before transmission and decode them when read, and I'm really quite surprised that there doesn't appear to be an existing framework method to do this.

هل كانت مفيدة؟

المحلول

Second : A Solution

Using the DataContractSerializer (which is used by default for WCF Services) instead of the XmlSerializer works a treat

[Serializable]
public class MyData
{
    public string Text { get; set; }
}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData
        {
            Text = "hello "
                + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 })
                + " world"
        };

        var serializer = new DataContractSerializer(typeof(MyData));

        var mem = new MemoryStream();

        serializer.WriteObject(mem, myData);

        mem.Seek(0, SeekOrigin.Begin);
        MyData myData2 = (MyData)serializer.ReadObject(mem);

        Console.WriteLine("myData2 {0}", myData2.Text);
    }
}

Frist : A Workaround

I can get it to choke when writing the Xml, by using an XmlWriter, which is arguably better than the client choking on it. e.g.

However it doesn't fix the underlying problem of sending the invalid characters

[Serializable]
public class MyData 
{
    public string Text { get; set; }
}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData {Text = "hello " 
            + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 }) 
            + " world"};
        var serializer = new System.Xml.Serialization.XmlSerializer(typeof(MyData));

        var sw = new StringWriter();
        XmlWriterSettings settings = new XmlWriterSettings();

        using (var writer = XmlWriter.Create(sw))
        {
            serializer.Serialize(writer, myData); // Exception
            // hexadecimal value 0x12, is an invalid character
        }
        var xmlReader = new StringReader(sw.ToString());

        var newUser = (MyData)serializer.Deserialize(xmlReader);

        Console.WriteLine("User Name = {0}", newUser);

    }
}

نصائح أخرى

A combination of Binary Worrier's post with an inserted special character filter works pretty well to filter the object right before the return of it:

public List<MyData> MyWebServiceMethod()
{
    var mydata = GetMyData();
    return Helper.ScrubObjectOfSpecialCharacters<List<MyData>>(mydata);
}

Helper class:

public static T ScrubObjectOfSpecialCharacters<T>(T obj)
{
    var serializer = new XmlSerializer(obj.GetType());

    using (StringWriter writer = new StringWriter())
    {
        serializer.Serialize(writer, obj);

        string content = writer.ToString();

        content = FixSpecialCharacters(content);

        using (StringReader reader = new StringReader(content))
        {
            obj = (T)serializer.Deserialize(reader);
        }
    }
    return obj;
}
public static string FixSpecialCharacters(string input)
{
    if (string.IsNullOrEmpty(input)) return input;

    StringBuilder output = new StringBuilder();
    for (int i = 0; i < input.Length; i++)
    {
        int charCode = (int)input[i];
        switch (charCode)
        {
            case 8211:
            case 8212:
                {
                    // replaces short and long hyphen
                    output.Append('-');
                    break;
                }
            default:
                {
                    if ((31 < charCode && charCode < 127) || charCode == 9)
                    {
                        output.Append(input[i]);
                    }
                    break;
                }
        }
    }
    return output.ToString();
}
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top