Question

I've seen several posts here on SO about loading XML documents from some data source where the data has Microsoft's proprietary UTF-8 preamble (for instance, this one).
However, I can't find an elegant (and working!) solution which does not involve striping out BOM characters manually.

For instance, there is this example:

byte[] b = System.IO.File.ReadAllBytes("c:\\temp_file_containing_bom.txt");
using (System.IO.MemoryStream oByteStream = new System.IO.MemoryStream(b)) {
    using (System.Xml.XmlTextReader oRD = new System.Xml.XmlTextReader(oByteStream)) {
        System.Xml.XmlDocument oDoc = new System.Xml.XmlDocument();
        oDoc.Load(oRD);
        Console.WriteLine(oDoc.OuterXml);
        Console.ReadLine();
    }
}

...but it still keeps throwing "invalid data" exception.

My problem is that I have a huge byte array which sometimes contains the BOM and sometimes it does not. I need to load it in XMLDocument. And I don't believe that I am the one who has to take care for the "helper" bytes.

Was it helpful?

Solution

That BOM is no longer 'proprietary'. It's written up in the XML specs. Only old version of Java (1.4) have a problem with it. It's pretty humorous if you've got MS technology exploding.

Use a buffered input stream to filter out the BOM by pushing back the first character if it's not the first character of the BOM sequence.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top