How to load XML progressively in C#?

https://stackoverflow.com/questions/3433505

26-09-2019
|

문제

I need to process loaded from ResponseStream on Async callbacks XML progressively.

Reply is have:

  <root>
     <node ...>
        .....
     </node>
     <node />
     ...
  </root>

schema, and i need to have ability process <node>'s before they arrive complete.

Is there normal way to parse it using standard .NET?

해결책

System.Xml.XmlTextReader

"Represents a reader that provides fast, non-cached, forward-only access to XML data."

http://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.aspx

Edit: This is a quick hack, but it does demonstrate that the reader is in fact lazy.

 public class XmlTextReaderTest
    {
        public void RunTest()
        {
            var fs = new XmlTextReader(new Fs(@"c:\TestXml.xml"));
            while (fs.Read())
                File.AppendAllText(@"c:\xLog.txt", "Processing node..." + Environment.NewLine);
        }
    }

    public class Fs : FileStream
    {
        public Fs(string path)
            : base(path, FileMode.Open)
        {

        }

        public override int Read(byte[] array, int offset, int count)
        {
            File.AppendAllText(@"c:\xLog.txt", "Reading from stream..." + Environment.NewLine);
            var ans = base.Read(array, offset, count);
            return ans;
        }
}

다른 팁

Don't call it on async callbacks, you don't need to (trust me, this will become clearer...).

The ResponseStream will load as information is available. In the case of a small (for quite large values of "small" I'm afraid) stream that is not sent chunked this will be when the entire stream has been downloaded. However if the stream is sent with a chunked transfer-encoding (this is what happens in ASP.NET if buffering is turned off or Response.Flush() is called, other web-server technologies have their equivalents) then the stream will be available at the first chunk.

Create your XMLReader from ResponseStream when the GetResponse() has returned. It will start processing from the first chunk being available, and obtain subsequent chunks as they arrive quite transparently to your code.

Make sure that your dealing with these nodes on an as-available basis actually benefits the code further along the line. E.g. if you are outputting to console or a form, do so as each node is processed (or a small batch of nodes), whereas if you are creating objects from these nodes, then yield return them rather than building up a collection.

Now, the big thing here is clearly the matter of whether the web stream is chunked, rather than your processing code. If the producer is another party that cannot be persuaded to do this, then you will need to drop to a lower level in your processing. However if this is the case then doing so is quite likely a false optimisation as the whole processing will be done on their end before they send the first byte, and that is were the biggest delay will be. Really, if the delay to get the entire response downloaded is a problem for your code, then you need them to start sending chunked as the delay with even the most efficient approach on your part will still be too great.

For the record, I've quite recently confirmed that in such a use of XmlReader on a WebResponse dealing with chunked data (where I controlled both the client and server code, and could have both running in a debugger and check on the order of operation), the processing is indeed done as each chunk is available.

Yes, there is a reader that you can use. Basicaly goes along a stream adnd throws n event for every element it identifies (element, attribute etc.).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow