How do I handle WCF Call lifecycles under load when timeouts are expected?

https://stackoverflow.com/questions/12779844

05-07-2021
|

Question

I have a nice fast task scheduling component (windows service as it happens but this is irrelevant), it subscribes to an in memory queue of things to do.

The queue is populated really fast ... and when I say fast I mean fast ... so fast that I'm experiencing problems with some particular part.

Each item in the queue gets a "category" attached to it and then is passed to a WCf endpoint to be processed then saved in a remote db.

This is presenting a bit of a problem.

The "queue" can be processed in the millions of items per minute whereas the WCF endpoint will only realistically handle about 1000 to 1200 items per second and many of those are "stacked" in order to wait for a slot to dump them to the db.

My WCF client is configured so that the call is fire and forget (deliberate) my problem is that when the call is made occasionally a timeout occurs and thats when the headaches begin.

The thread just seems to stop after timeout no dropping in to my catch block nothing ... just sits there, whats even more confusing is that this is an intermittent thing, this only happens when the queue is dealing with extreme loads and the WCF endpoint is over taxed, and even in that scenario it's only about once a fortnight this happens.

This code is constantly running on the server, round the clock 24/7.

So ... my question ... How can I identify the edge case that is causing my problem so that I can resolve it?

Some extra info:

The client calling the WCF endpoint seems to automatically "throttle itself" by the fact that i'm limiting the number of threads making calls, and the code hangs about until a call is considered complete (i'm thinking this is a http level thing as im not asking the service for a result of my method call).

The db is talked to with EF which seems to never open more than a fixed number of connections to the db (quite a low number too which is cool) and the WCF endpoint from the call reception back seems super reliable.

The problem seems to be coming off the queue processor to the WCf endpoint.

The queue processor has a single instance of my WCF endpoint client which it reuses for all calls ... (is it good practice to rebuild this endpoint per call? - bear in mind number of calls here).

Final note:

It's a peculiar "module" of functionality, under heavy load for hours at a time it's stable, but for some reason this odd thing happens resulting in the whole lot just stopping and not recovering. The call is wrapped in a try catch, but seemingly even if the catch is hit (which isn't guaranteed) the code doesn't recover / drop out as expected ... it just hangs.

Any ideas?

Please let me know if there's anything else I can add to help resolve this.

Edit 1:

binding - basicHttpBinding

error handling - no code written other than wrapping the WCF call in a try catch.

Solution

Seemingly my solution appears to be to increase the timeout settings on the client config to allow the server more time to respond.

The net result being that whilst the database is busy saving data (effectively the slowest part of this process) the calling client sits and waits (on all threads but seemingly not as long as i would have liked).

This issue seems to be the net result of a lot of multithreaded calls to the WCF and not giving it enough time to respond.

The high load is not conintuous, the service usage seems to spike then tail off, adding to the expected response time allows spikes to be filtered through as they happen.

A key note: Way too many calls will result in the server / service treating them as a dos type attack and as such may simply terminate the connection. This isn't what I'm getting, but some fine tuning and time may result in this ...

Time for some bigger servers !!!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow