Question

I'm trying to convert HTML stream to XML using SgmlReader for further parsing. This conversion is part of an APP i'm developing for Windows 8 Store. Below is the method that convert Html to XML:-

public static void ConvertToXml(string webResponse)
    {
        StringWriter sWriter = new StringWriter();
        XmlWriter xmlWriter = XmlWriter.Create(sWriter);
        SgmlReader sgmlReader = new SgmlReader();
        sgmlReader.DocType = "HTML";
        sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
        sgmlReader.CaseFolding = CaseFolding.ToLower;
        sgmlReader.InputStream = new StringReader(webResponse);
        sgmlReader.IgnoreDtd = true;
        while (!sgmlReader.EOF)
        {
            xmlWriter.WriteNode(sgmlReader, true);
        }
        xmlWriter.Flush();
        XmlString = sWriter.ToString();
    }

The sgmlReader.WhitespaceHandling = WhitespaceHandling.All; is the problem as Xml.WhitespaceHandling is not present. Is there anyother way to do this?

Was it helpful?

Solution

After alot of reading and testing/debugging just found that sgmlReader.WhitespaceHandling = WhitespaceHandling.All is not needed atleast in my case, as sgmlReader.WhitespaceHandling is set to All by default. However i removed sgmlReader.IgnoreDtd = true; and now my Xml file look Normal ;)

Hope this will help someone

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top