Efficiency of deserialization vs. XmlReader

https://stackoverflow.com/questions/1555433

20-09-2019
|

Question

I'm working with a complicated xml schema, for which I have created a class structure using xsd.exe (with some effort). I can now reliably deserialize the xml into the generated class structure. For example, consider the following xml from the web service:

<ODM FileType="Snapshot" CreationDateTime="2009-10-09T19:58:46.5967434Z" ODMVersion="1.3.0" SourceSystem="XXX" SourceSystemVersion="999">
  <Study OID="2">
    <GlobalVariables>
      <StudyName>Test1</StudyName>
      <StudyDescription/>
      <ProtocolName>Test0001</ProtocolName>
    </GlobalVariables>
    <MetaDataVersion OID="1" Name="Base Version" Description=""/>
    <MetaDataVersion OID="2" Name="Test0001" Description=""/>
    <MetaDataVersion OID="3" Name="Test0002" Description=""/>
  </Study>
</ODM>

I can deserialize the xml as follows:

public ODMcomplexTypeDefinitionStudy GetStudy(string studyId)
{
  ODMcomplexTypeDefinitionStudy study = null;
  ODM odm = Deserialize<ODM>(Service.GetStudy(studyId));
  if (odm.Study.Length > 0)
    study = odm.Study[0];
  return study;
}

Service.GetStudy() returns an HTTPResponse stream from the web service. And Deserialize() is a helper method that deserializes the stream into the object type T.

My question is this: is it more efficient to let the deserialization process create the entire class structure and deserialize the xml, or is it more efficient to grab only the xml of interest and deserialize that xml. For example, I could replace the above code with:

public ODMcomplexTypeDefinitionStudy GetStudy(string studyId)
{
  ODMcomplexTypeDefinitionStudy study = null;
  using (XmlReader reader = XmlReader.Create(Service.GetStudy(studyId)))
  {
    XDocument xdoc = XDocument.Load(reader);
    XNamespace odmns = xdoc.Root.Name.Namespace;
    XElement elStudy = xdoc.Root.Element(odmns + "Study");
    study = Deserialize<ODMcomplexTypeDefinitionStudy>(elStudy.ToString());
  }
return study;
}

I suspect that the first approach is preferred -- there is a lot of dom manipulation going on in the second example, and the deserialization process must have optimizations; however, what happens when the xml grows dramatically? Let's say the source returns 1 MB of xml and I'm really only interested in a very small component of that xml. Should I let the deserialzation process fill up the containing ODM class with all it's arrays and properties of child nodes? Or just go get the child node as in the second example!!??

Not sure this helps, but here's a summary image of the dilemma:

alt text

Solution

Brett,

Later versions of .net will build custom serializer assemblies. Click on project properties -> build and look for "Generate serialization assemblies" and change to On. The XML deserializer will use these assemblies which are customized to the classes in your project. They are much faster and less resource intensive since reflection is not involved.

I would go this route so that if you class changes you will not have to worry about serialization issues. Performance should not be an issue.

OTHER TIPS

I recommend that you not preoptimize. If you have your code working, then use it as it is. Go on to work on some code that is not finished, or which does not work.

Later, if you find you have a performance problem in that area, you can explore performance.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow