Question

Can anyone give me any pointers to resources detailing the structures and algorithms used "under the hood" of the .NET DataSet class?

I'm currently working in a language that doesn't have an implementation of a generic in-memory data structure like the DataSet class. I may end up having to build one myself (but not as extensive!), but I would like to have a better idea of how existing systems are put together before I start hacking away. (Thank you Douglas Hofstadter)

I'm reading through the Mono implementation, but any other links, articles, or ideas would be appreciated.

Was it helpful?

Solution

Fire up Reflector and go straight to the source...

OTHER TIPS

You can legally download the original source code for DataSet from Microsoft, which will give you access to everything, including original comments. However, if you are implementing a similar system/product, you should carefully study the license to ensure you aren't likely to breach it by looking at the source code.

Reflector is available at http://www.red-gate.com/products/reflector/index.htm, and will allow you to see exactly how DataSet is implemented.

You will also want to look at DataAdapter, especially DbDataAdapter.Fill, DataTable, especially DataTable.Load, and DataRow and DbDataReader. You might then want to look at some of the specifici implementations like SqlDataAdapter, SqlCommand, etc.

Note that DataSet is proof of the old adage about everything looking like a nail. It was over-used in .NET. In particular, do not use it to transfer data between a web service and its clients - it does not interoperate well.

The key datastructure of the ADO.Net DataSet is the Red-Black tree

It is complex, but has good worst-case running time for its operations and is efficient in practice: it can search, insert, and delete in O(log n) time, where n is total number of elements in the tree. Put very simply, a red-black tree is a binary search tree which inserts and removes intelligently, to ensure the tree is reasonably balanced.

The book called Programming ADO.Net 2.0 core reference has an excellent description of datasets, including many issues and internal data structures discussed.

The major implementation of the dataset relies on Red/Black self balancing trees.

The other large caveat that the book mentioned is that the lookup of data code via strings was not implemented efficiently, as it uses string comparison to search a lookup table for the appropriate column. You could get quite good increases in lookup performance on both tables and on columns by hardcoding in the numbers of your rows. Of course this is a maintenance nightmare unless you write a tool to do it for you.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top