Question

I am opening a XML file using .NET XmlReader and saving the file in another filename and it seems that the DOCTYPE declaration changes between the two files. While the newly saved file is still valid XML, I was wondering why it insisted on changing original tags.

Dim oXmlSettings As Xml.XmlReaderSettings = New Xml.XmlReaderSettings()
oXmlSettings.XmlResolver = Nothing
oXmlSettings.CheckCharacters = False
oXmlSettings.ProhibitDtd = False
oXmlSettings.IgnoreWhitespace = True

Dim oXmlDoc As XmlReader = XmlReader.Create(pathToOriginalXml, oXmlSettings)
Dim oDoc As XmlDocument = New XmlDocument()
oDoc.Load(oXmlDoc)
oDoc.Save(pathToNewXml)

The following (in the original document):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd">

becomes (notice the [ ] characters at the end):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd"[]>
Was it helpful?

Solution

Probably the library parses the DOCTYPE element into an internal structure and then converts the structure back to text. It doesn't store the original string form.

OTHER TIPS

There is a bug in System.Xml when you set XmlDocument.XmlResolver = null. The workaround is to create a custom XmlTextWriter:

    private class NullSubsetXmlTextWriter : XmlTextWriter
    {
        public NullSubsetXmlTextWriter(String inputFileName, Encoding encoding)
            : base(inputFileName, encoding)
        {
        }
        public override void WriteDocType(string name, string pubid, string sysid, string subset)
        {
            if (subset == String.Empty)
            {
                subset = null;
            }
            base.WriteDocType(name, pubid, sysid, subset);
        }
    }

In your code, create a new NullSubsetXmlTextWriter(pathToNewXml, Encoding.UTF8) and pass that object to the oDoc.Save() method.

Here is the Microsoft support case where you can read about the workaround (it describes the workaround but doesn't provide the code).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top