Question

I've got a function creating some XmlDocument:

public string CreateOutputXmlString(ICollection<Field> fields)
{
    XmlWriterSettings settings = new XmlWriterSettings();
    settings.Indent = true;
    settings.Encoding = Encoding.GetEncoding("windows-1250");

    StringBuilder builder = new StringBuilder();
    XmlWriter writer = XmlWriter.Create(builder, settings);

    writer.WriteStartDocument();
    writer.WriteStartElement("data");
    foreach (Field field in fields)
    {
        writer.WriteStartElement("item");
        writer.WriteAttributeString("name", field.Id);
        writer.WriteAttributeString("value", field.Value);
        writer.WriteEndElement();
    }
    writer.WriteEndElement();
    writer.Flush();
    writer.Close();

    return builder.ToString();
}

I set an encoding but after i create XmlWriter it does have utf-16 encoding. I know it's because strings (and StringBuilder i suppose) are encoded in utf-16 and you can't change it.
So how can I easily create this xml with the encoding attribute set to "windows-1250"? it doesn't even have to be encoded in this encoding, it just has to have the specified attribute.

edit: it has to be in .Net 2.0 so any new framework elements cannot be used.

Was it helpful?

Solution

You need to use a StringWriter with the appropriate encoding. Unfortunately StringWriter doesn't let you specify the encoding directly, so you need a class like this:

public sealed class StringWriterWithEncoding : StringWriter
{
    private readonly Encoding encoding;

    public StringWriterWithEncoding (Encoding encoding)
    {
        this.encoding = encoding;
    }

    public override Encoding Encoding
    {
        get { return encoding; }
    }
}

(This question is similar but not quite a duplicate.)

EDIT: To answer the comment: pass the StringWriterWithEncoding to XmlWriter.Create instead of the StringBuilder, then call ToString() on it at the end.

OTHER TIPS

Just some extra explanations to why this is so.

Strings are sequences of characters, not bytes. Strings, per se, are not "encoded", because they are using characters, which are stored as Unicode codepoints. Encoding DOES NOT MAKE SENSE at String level.

An encoding is a mapping from a sequence of codepoints (characters) to a sequence of bytes (for storage on byte-based systems like filesystems or memory). The framework does not let you specify encodings, unless there is a compelling reason to, like to make 16-bit codepoints fit on byte-based storage.

So when you're trying to write your XML into a StringBuilder, you're actually building an XML sequence of characters and writing them as a sequence of characters, so no encoding is performed. Therefore, no Encoding field.

If you want to use an encoding, the XmlWriter has to write to a Stream.

About the solution that you found with the MemoryStream, no offense intended, but it's just flapping around arms and moving hot air. You're encoding your codepoints with 'windows-1252', and then parsing it back to codepoints. The only change that may occur is that characters not defined in windows-1252 get converted to a '?' character in the process.

To me, the right solution might be the following one. Depending on what your function is used for, you could pass a Stream as a parameter to your function, so that the caller decides whether it should be written to memory or to a file. So it would be written like this:


        public static void WriteFieldsAsXmlDocument(ICollection fields, Stream outStream)
        {
            XmlWriterSettings settings = new XmlWriterSettings();
            settings.Indent = true;
            settings.Encoding = Encoding.GetEncoding("windows-1250");

            using(XmlWriter writer = XmlWriter.Create(outStream, settings)) {
                writer.WriteStartDocument();
                writer.WriteStartElement("data");
                foreach (Field field in fields)
                {
                    writer.WriteStartElement("item");
                    writer.WriteAttributeString("name", field.Id);
                    writer.WriteAttributeString("value", field.Value);
                    writer.WriteEndElement();
                }
                writer.WriteEndElement();
            }
        }
MemoryStream memoryStream = new MemoryStream();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = Encoding.UTF8;

XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();

string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());

From here

I actually solved the problem with MemoryStream:

public static string CreateOutputXmlString(ICollection<Field> fields)
        {
            XmlWriterSettings settings = new XmlWriterSettings();
            settings.Indent = true;
            settings.Encoding = Encoding.GetEncoding("windows-1250");

            MemoryStream memStream = new MemoryStream();
            XmlWriter writer = XmlWriter.Create(memStream, settings);

            writer.WriteStartDocument();
            writer.WriteStartElement("data");
            foreach (Field field in fields)
            {
                writer.WriteStartElement("item");
                writer.WriteAttributeString("name", field.Id);
                writer.WriteAttributeString("value", field.Value);
                writer.WriteEndElement();
            }
            writer.WriteEndElement();
            writer.Flush();
            writer.Close();

            writer.Flush();
            writer.Close();

            string xml = Encoding.GetEncoding("windows-1250").GetString(memStream.ToArray());

            memStream.Close();
            memStream.Dispose();

            return xml;
        }

I solved mine by outputting the string to a variable then replacing any references to utf-16 with utf-8 (my app needed UTF8 encoding). Since you're using a function, you could do something similar. I use VB.net mostly, but I think the C# would look something like this.

return builder.ToString().Replace("utf-16", "utf-8");
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top