Question

Consider this problem: I have a program which should fetch (let's say) 100 records from a database, and then for each one it should get updated information from a web service. There are two ways to introduce parallelism in this scenario:

  1. I start each request to the web service on a new Thread. The number of simultaneous threads is controlled by some external parameter (or dynamically adjusted somehow).

  2. I create smaller batches (let's say of 10 records each) and launch each batch on a separate thread (so taking our example, 10 threads).

Which is a better approach, and why do you think so?

Was it helpful?

Solution

Option 3 is the best:

Use Async IO.

Unless your request processing is complex and heavy, your program is going to spend 99% of it's time waiting for the HTTP requests.

This is exactly what Async IO is designed for - Let the windows networking stack (or .net framework or whatever) worry about all the waiting, and just use a single thread to dispatch and 'pick up' the results.

Unfortunately the .NET framework makes it a right pain in the ass. It's easier if you're just using raw sockets or the Win32 api. Here's a (tested!) example using C#3 anyway:

using System.Net; // need this somewhere

// need to declare an class so we can cast our state object back out
class RequestState {
    public WebRequest Request { get; set; }
}

static void Main( string[] args ) {
    // stupid cast neccessary to create the request
    HttpWebRequest request = WebRequest.Create( "http://www.stackoverflow.com" ) as HttpWebRequest;

    request.BeginGetResponse(
        /* callback to be invoked when finished */
        (asyncResult) => { 
            // fetch the request object out of the AsyncState
            var state = (RequestState)asyncResult.AsyncState; 
            var webResponse = state.Request.EndGetResponse( asyncResult ) as HttpWebResponse;

            // there we go;
            Debug.Assert( webResponse.StatusCode == HttpStatusCode.OK ); 

            Console.WriteLine( "Got Response from server:" + webResponse.Server );
        },
        /* pass the request through to our callback */
        new RequestState { Request = request }  
    );

    // blah
    Console.WriteLine( "Waiting for response. Press a key to quit" );
    Console.ReadKey();
}

EDIT:

In the case of .NET, the 'completion callback' actually gets fired in a ThreadPool thread, not in your main thread, so you will still need to lock any shared resources, but it still saves you all the trouble of managing threads.

OTHER TIPS

Two things to consider.

1. How long will it take to process a record?

If record processing is very quick, the overhead of handing off records to threads can become a bottleneck. In this case, you would want to bundle records so that you don't have to hand them off so often.

If record processing is reasonably long-running, the difference will be negligible, so the simpler approach (1 record per thread) is probably the best.

2. How many threads are you planning on starting?

If you aren't using a threadpool, I think you either need to manually limit the number of threads, or you need to break the data into big chunks. Starting a new thread for every record will leave your system thrashing if the number of records get large.

The computer running the program is probably not the bottleneck, so: Remember that the HTTP protocol has a keep-alive header, that lets you send several GET requests on the same sockets, which saves you from the TCP/IP hand shake. Unfortunately I don't know how to use that in the .net libraries. (Should be possible.)

There will probably also be a delay in answering your requests. You could try making sure that you allways have a given number of outstanding requests to the server.

Get the Parallel Fx. Look at the BlockingCollection. Use a thread to feed it batches of records, and 1 to n threads pulling records off the collection to service. You can control the rate at which the collection is fed, and the number of threads that call to web services. Make it configurable via a ConfigSection, and make it generic by feeding the collection Action delegates, and you'll have a nice little batcher you can reuse to your heart's content.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top