Question

I have a problem with scalability and processing and I want to get the opinion of the stack overflow community.

I basically have XML data coming down a socket and I want to process that data. For each XML line sent processing can include writing to a text file, opening a socket to another server and using various database queries; all of which take time.

At the minute my solution involves the following threads: Thread 1 Accepts incoming sockets and thus generates child threads that handle each socket (there will only be a couple of incoming sockets from clients). When an XML line comes through (ReadLine() method on StreamReader) I basically put this line into a Queue, which is accessible via a static method on a class. This static method contains locking logic to ensure that the program is threadsafe (I could use Concurrent Queue for this of course instead of manual locking).

Threads 2-5 Constantly take XML lines from the queue and processes them one at a time (database queries, file writes etc).

This method seems to be working but I was curious if there is a better way of doing things because this seems very crude. If I take the processing that threads 2-5 do into thread 1 this results in extremely slow performance, which I expected, so I created my worker threads (2-5).

I appreciate I could replace threads 2-5 with a thread pool but the thread pool would still be reading from the same Queue of XML lines so I wandered if there is a more efficient way of processing these events instead of using the Queue?

Was it helpful?

Solution

A queue1 is the right approach. But I would certainly move from manual thread control to the thread pool (and thus I don't need to do thread management) and let it manage the number of threads.2

But in the end there is only so much processing a single computer (however expensive) can do. At some point one of memory size, CPU-memory bandwidth, storage IO, network IO, … is going to be saturated. At that point using an external queuing system (MSMQ, WebSphere*MQ, Rabbit-MQ, …) with each task being a separate message allows many workers on many computers to process the data ("competing consumers" pattern).


1 I would move immediately to ConcurrentQueue: getting locking right is hard, the more you don't need to do it yourself the better.

2 At some point you might find you need more control than the thread pool providers, that is the time to switch to a custom thread pool. But prototype and test: it is quite possible your implementation will actually be worse: see paragraph 2.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top