How can we troubleshoot intermittent “An existing connection was forcibly closed” errors caused by a Cisco CSS

StackOverflow https://stackoverflow.com/questions/3426885

Question

We have the "standard" three tier architecture with our middle tier hosted in IIS and accessed via .net remoting. These errors occur between our web and web services servers (front tier) that are remoting to the app servers (middle tier). We'll get this error 3-10 times a day out of ~130K total calls in the day.

The exception and stack trace always look similar to this:


Exception Type: System.Net.WebException
Message: The underlying connection was closed: An unexpected error occurred on a receive.

Server stack trace: 
   at System.Runtime.Remoting.Channels.Http.HttpClientTransportSink.ProcessResponseException(WebException webException, HttpWebResponse& response)
   at System.Runtime.Remoting.Channels.Http.HttpClientTransportSink.ProcessMessage(IMessage msg, ITransportHeaders requestHeaders, Stream requestStream, ITransportHeaders& responseHeaders, Stream& responseStream)
   at System.Runtime.Remoting.Channels.BinaryClientFormatterSink.SyncProcessMessage(IMessage msg)

Exception rethrown at [0]: 
   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
   at XXXXX.BusinessFacade.Interface.XXXXInterface.SubmitXXXX(
   at XXX.XXXXWebServicesLibrary.XXXXService.CreateXXXXXX.RunXXXXMethod()
   at XXX.XXXXWebServicesLibrary.XXXXService.XXXXXXMethod`2.RunMethod()
   at XXX.XXXXWebServicesLibrary.XXXXXWebMethod`2.Run()HandleReturnMessage()
Inner Exception: 

Exception Type: System.IO.IOException
Message: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.PooledStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.Connection.SyncRead(HttpWebRequest request, Boolean userRetrievedStream, Boolean probeRead)Read()
Inner Exception: 

Exception Type: System.Net.Sockets.SocketException
Message: An existing connection was forcibly closed by the remote host
   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)Receive()

There's no particular remoting call that causes this to happen, it can be any of them which seems to rule out any sort of application specific cause. The only common denominator is the "Exception Type: System.Net.Sockets.SocketException Message: An existing connection was forcibly closed by the remote host" portion of the error.

The front and middle tiers are separated by a firewall and we are also utilizing a VIP device. I strongly suspect an issue with our network/firewall configuration but our network guys are just scratching their heads and not offering any suggestions.

Although a 0.003% failure rate may seem insignificant, we have partners that scrutinize our communications very carefully and I am just waiting for this to become an issue they notice. I don't want to have to say "I don't know" when that time comes.

Does anyone have any ideas on how I could provide more information or any suggestions I could make to our network guys to get this resolved?

Was it helpful?

Solution

The problem was the Cisco CSS. We determined this by pointing the tier 1 servers directly to the tier 2 servers and going 48 hours without observing the problem. Once we determined it was the CSS, we corrected this problem by adjusting the insanely low default value for this parameter:

"Default flow inactivity timeouts, in seconds, for the TCP or UDP port. If a flow is idle for the amount of time specified in the timeout value, the CSS tears down the flow and reclaims the flow resources."

We set this to 84 (which is 84 16-second increments). Since the default keep-alive for HTTP is 120 seconds, the default value was too low.

OTHER TIPS

To check recycling of the Application pool go to your IIS and open the Properties of the Application Pool on which your remoting service is running. You can configure recycling of Application pools using a time interval, number of requests or define specific times.

You could remove the current recycling rules and set a recycling to a time where no connections are expected, like 3.00 at night. Then see if the exceptions stil occur.

It could be a network component causing this. The way to rule this out would be to place both machines (or test machines) on the same subnet, then run a load test, and verify that you do not get the same error.

The other things that could be causing it could be:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top