Question

In an application I am working on, it needs to read data from about 5 to 10 tables in the same SQL Server database, load them into multiple List<> and process the data afterwards. The number of rows returned from each table varies, 1 row for some and about 1,000 rows for some others.

The application is written in C# 4.0, so I was thinking of using Task Parallel Library and start multiple tasks to retrieve data from these tables simultaneously. I thought it would be faster than calling a single query that returns the data from these tables in a sequence, but don't know for sure. So my questions are:

  1. Considering the overheads such as opening multiple connections to the database, would it really be faster to retrieve data using multiple tasks?
  2. If the answer is Yes in some cases, what are the cases where the answer is No?
  3. If the answer is No, is there any alternative approach?
  4. Any other things to consider when using multiple tasks to retrieve data?

To narrow down the scope, here are some assumptions you can make:

  • Data processing is not part of the question. Just the data retrieval alone.
  • Some of these tables are related with FKs. Like Parent <-- Child <-- Grandchild
  • These tables only contain varchar and numeric columns, and the row size is less than 400 bytes.
  • Plain ADO.NET (e.g. SqlConnection/SqlCommand/SqlDataReader) is used to retrieve the data from each table
  • Each task will call a self-contained method that reads the data from a SqlDataReader and returns a List of objects.
  • The database server is powerful enough to handle all concurrent connections and queries.
  • The client computer is powerful enough to handle the data operation if using single query to retrieve all data in a sequence. And it is capable of doing multi-threading.
  • Network is fast enough for transmitting data if using single query

Thank you in advance for your time and input!

Was it helpful?

Solution

Start off by saying I dont think this is what you need to focus on for this scenario. The amount of data does not seem to be an issue here, so I would focus on somewhere else.

To answer some of your questiosn. Yes, parallel loading can improve performance, but typically in instances where the data (row count) is a lot larger. You do need to be aware of the memory foot print though, as you dont want to kill the server with all the data in memory.

If you were running this mutiple times and quite frequently, I would say that then you need to focus on this.

As always, don't optimize until it becomes an issue.

It is more important to write code that is understandable and maintainable. Some months down the line you are going to thank your stars when you need to revist this code.

Also, what have you tried?

Have you done any benchmark testing? Write a small app and loop both cases several times (hundreds or even thousands) and measure the time it takes. Use a Stopwatch and see what the time differences are.

OTHER TIPS

10 tables with 1000 rows at max sounds like a small volume of data. Instead of opening multiple connections and performing parallel processing, I would rather have only one stored procedure returning all different tables and read it one-by-one using dataReader.NextResult().

See accepted answer on following StackOverflow question Multiples Table in DataReader

I am going to say "it depends" on the scenario. Let me explain, spawning of several parallel retrieval tasks and subsequent processing will be "faster" if a task does not depend on the query or processing result from another parallel task. If you go this route check out slightly dated but still relevant "Concurrency & Coordination With Futures in C#" and C# async, await feature.

Some part of me is thinking you introducing complexity that can be explored/investigated when the need arises and engineer only what is needed for now.

Before going the parallel route, please explore the possibility of lazy loading for different parts of the "web page" (I am assuming a web app) using jQuery/ajax to increase the perceived responsiveness of the web page.

No matter which approach you choose, its pertinent that you profile the web page using YSlow or similar tools.

Sorry for the rant!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top