Excepción de tiempo de espera de WCF Investigación detallada

https://stackoverflow.com/questions/981475

13-09-2019
|

Pregunta

Tenemos una aplicación que tiene un servicio WCF (*.SVC) que se ejecuta en IIS7 y varios clientes que consultan el servicio. El servidor está ejecutando el servidor Win 2008. Los clientes ejecutan Windows 2008 Server o Windows 2003 Server. Estoy obteniendo la siguiente excepción, que he visto, de hecho, puede estar relacionado con una gran cantidad de posibles problemas de WCF.

System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:59.9320000. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The HTTP request to 'http://www.domain.com/WebServices/myservice.svc/gzip' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout.

He aumentado el tiempo de espera de 30 minutos y el error aún ocurrió. Esto me dice que hay algo más en juego, porque la cantidad de datos nunca podría tardar 30 minutos en cargar o descargar.

El error va y viene. Por el momento, es más frecuente. No parece importar si tengo 3 clientes que se ejecutan simultáneamente o 100, todavía ocurre de vez en cuando. La mayoría de las veces, no hay tiempos de espera, pero aún así tengo unos pocos por hora. El error proviene de cualquiera de los métodos que se invocan. Uno de estos métodos no tiene parámetros y devuelve un poco de datos. Otro toma muchos datos como parámetro pero se ejecuta asíncronamente. Los errores siempre se originan en el cliente y nunca hacen referencia a ningún código en el servidor en la traza de pila. Siempre termina con:

 at System.Net.HttpWebRequest.GetResponse()
  at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)

En el servidor: he probado (y actualmente tengo) la siguiente configuración de enlace:

maxBufferSize="2147483647" maxReceivedMessageSize="2147483647" maxBufferPoolSize="2147483647"

No parece tener un impacto.

He probado (y actualmente tengo) la siguiente configuración de aceleración:

<serviceThrottling maxConcurrentCalls="1500"   maxConcurrentInstances="1500"    maxConcurrentSessions="1500"/>

No parece tener un impacto.

Actualmente tengo la siguiente configuración para el servicio WCF.

[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Single)]

Corrí con ConcurrencyMode.Multiple Por un tiempo, y el error aún ocurrió.

Intenté reiniciar IIS, reiniciando mi servidor SQL subyacente, reiniciando la máquina. Todo esto no parece tener un impacto.

He intentado deshabilitar el firewall de Windows. No parece tener un impacto.

En el cliente, tengo estas configuraciones:

maxReceivedMessageSize="2147483647"

<system.net>
    <connectionManagement>
    <add address="*" maxconnection="16"/>
</connectionManagement> 
</system.net>

Mi cliente cierra sus conexiones:

var client = new MyClient();

try
{
    return client.GetConfigurationOptions();
}
finally
{
    client.Close();
}

He cambiado la configuración del registro para permitir más conexiones salientes:

MaxConnectionsPerServer=24, MaxConnectionsPer1_0Server=32.

Hace poco probé svctraceViewer.exe. Me las arreglé para atrapar una excepción en el final del cliente. Veo que su duración es de 1 minuto. Mirando la traza del lado del servidor, puedo ver que el servidor no es consciente de esta excepción. La duración máxima que puedo ver es de 10 segundos.

He examinado las conexiones de la base de datos activa usando exec sp_who en el servidor. Solo tengo unos pocos (2-3). He visto las conexiones TCP de un cliente usando TCPVIEW. Por lo general, es alrededor de 2-3 y he visto hasta 5 o 6.

En pocas palabras, estoy perplejo. He intentado todo lo que pude encontrar, y debo perder algo muy simple que un experto en WCF podría ver. Es mi instinto que algo está bloqueando a mis clientes en el nivel de bajo nivel (TCP), antes de que el servidor realmente reciba el mensaje y/o que algo está haciendo cola los mensajes a nivel del servidor y nunca los dejes procesar.

Si tiene algún contador de rendimiento que debería mirar, hágamelo saber. (Indique qué valores son malos, ya que algunos de estos contadores son difíciles de decifer). Además, ¿cómo podría registrar el tamaño del mensaje WCF? Finalmente, ¿hay alguna herramienta que me permita probar cuántas conexiones puedo establecer entre mi cliente y servidor (independientemente de mi aplicación)

¡Gracias por tu tiempo!

Información adicional agregada el 20 de junio:

Mi aplicación WCF hace algo similar a lo siguiente.

while (true)
{
   Step1GetConfigurationSettingsFromServerViaWCF(); // can change between calls
   Step2GetWorkUnitFromServerViaWCF();
   DoWorkLocally(); // takes 5-15minutes. 
   Step3SendBackResultsToServerViaWCF();
}

Usando Wireshark, vi que cuando ocurre el error, tengo cinco retransmisiones TCP seguidas de un reinicio de TCP más adelante. Supongo que el primero viene de WCF matando la conexión. El informe de excepción que obtengo es del tiempo de tiempo de paso 3.

Descubrí esto mirando la transmisión TCP "TCP.Stream EQ 192". Luego amplié mi filtro a "tcp.stream eq 192 y http y http.request.method eq post" y vi 6 publicaciones durante esta secuencia. Esto parecía extraño, así que revisé con otra transmisión como TCP.Stream EQ 100. Tenía tres publicaciones, lo que parece un poco más normal porque estoy haciendo tres llamadas. Sin embargo, cierro mi conexión después de cada llamada de WCF, por lo que hubiera esperado una llamada por transmisión (pero no sé mucho sobre TCP).

Investigando un poco más, arrojé la carga de paquetes HTTP al disco para ver cómo estos seis llaman a dónde.

1) Step3
2) Step1
3) Step2
4) Step3 - corrupted
5) Step1
6) Step2

Supongo que dos clientes concurrentes están usando la misma conexión, por eso vi duplicados. Sin embargo, todavía tengo algunos problemas más que no puedo comprender:

a) ¿Por qué se corrompe el paquete? Fluke de red aleatoria, ¿tal vez? La carga se gzipps usando este código de muestra: http://msdn.microsoft.com/en-us/library/ms751458.aspx - ¿Podría el código ser erróneo de vez en cuando cuando se usa simultáneamente? Debería probar sin la biblioteca GZIP.

b) ¿Por qué vería el Paso 1 y el Paso 2 en ejecución después de que la operación corrupta se agotara el tiempo de expulsión? Me parece que estas operaciones no deberían haber ocurrido. Tal vez no estoy mirando la corriente correcta porque mi comprensión de TCP es defectuosa. Tengo otras corrientes que ocurren al mismo tiempo. Debo investigar otras transmisiones: un vistazo rápido a las transmisiones 190-194 muestra que la publicación del paso 3 tiene datos de carga útil adecuados (no dañados). Empujándome para que vuelva a mirar la biblioteca GZIP.

Solución

Si está utilizando el cliente .NET, es posible que no haya establecido

//This says how many outgoing connection you can make to a single endpoint. Default Value is 2
System.Net.ServicePointManager.DefaultConnectionLimit = 200;

Aquí está la pregunta y respuesta originales WCF Service Throttling

Actualizar:

Esta configuración va en la aplicación del cliente .NET puede estar en el inicio o cuando sea, pero antes de comenzar sus pruebas.

Además, puede tenerlo en el archivo app.config, así como seguir

<system.net>
    <connectionManagement>
      <add maxconnection = "200" address ="*" />
    </connectionManagement>
  </system.net>

Otros consejos

If you havn't tried it already - encapsulate your Server-side WCF Operations in try/finally blocks, and add logging to ensure they are actually returning.

If those show that the Operations are completing, then my next step would be to go to a lower level, and look at the actual transport layer.

Wireshark or another similar packet capturing tool can be quite helpful at this point. I'm assuming this is running over HTTP on standard port 80.

Run Wireshark on the client. In the Options when you start the capture, set the capture filter to tcp http and host service.example.com - this will reduce the amount of irrelevant traffic.

If you can, modify your client to notify you the exact start time of the call, and the time when the timeout occurred. Or just monitor it closely.

When you get an error, then you can trawl through the Wireshark logs to find the start of the call. Right click on the first packet that has your client calling out on it (Should be something like GET /service.svc or POST /service.svc) and select Follow TCP Stream.

Wireshark will decode the entire HTTP Conversation, so you can ensure that WCF is actually sending back responses.

from: http://www.codeproject.com/KB/WCF/WCF_Operation_Timeout_.aspx

To avoid this timeout error, we need to configure the OperationTimeout property for Proxy in the WCF client code. This configuration is something new unlike other configurations such as Send Timeout, Receive Timeout etc., which I discussed early in the article. To set this operation timeout property configuration, we have to cast our proxy to IContextChannel in WCF client application before calling the operation contract methods.

I'm having a very similar problem. In the past, this has been related to serialization problems. If you are still having this problem, can you verify that you can correctly serialize the objects you are returning. Specifically, if you are using Linq-To-Sql objects that have relationships, there are known serialization problems if you put a back reference on a child object to the parent object and mark that back reference as a DataMember.

You can verify serialization by writing a console app that serializes and deserializes your objects using the DataContractSerializer on the server side and whatever serialization methods your client uses. For example, in our current application, we have both WPF and Compact Framework clients. I wrote a console app to verify that I can serialize using a DataContractSerializer and deserialize using an XmlDesserializer. You might try that.

Also, if you are returning Linq-To-Sql objects that have child collections, you might try to ensure that you have eagerly loaded them on the server side. Sometimes, because of lazy loading, the objects being returned are not populated and may cause the behavior you are seeing where the request is sent to the service method multiple times.

If you have solved this problem, I'd love to hear how because I'm stuck with it too. I have verified that my issue is not serialization so I'm at a loss.

UPDATE: I'm not sure if it will help you any but the Service Trace Viewer Tool just solved my problem after 5 days of very similar experience to yours. By setting up tracing and then looking at the raw XML, I found the exceptions that were causing my serialization problems. It was related to Linq-to-SQL objects that occasionally had more child objects than could be successfully serialized. Adding the following to your web.config file should enable tracing:

<sharedListeners>
    <add name="sharedListener"
         type="System.Diagnostics.XmlWriterTraceListener"
         initializeData="c:\Temp\servicetrace.svclog" />
  </sharedListeners>
  <sources>
    <source name="System.ServiceModel" switchValue="Verbose, ActivityTracing" >
      <listeners>
        <add name="sharedListener" />
      </listeners>
    </source>
    <source name="System.ServiceModel.MessageLogging" switchValue="Verbose">
      <listeners>
        <add name="sharedListener" />
      </listeners>
    </source>
  </sources>

The resulting file can be opened with the Service Trace Viewer Tool or just in IE to examine the results.

Are you closing the connection to the WCF service in between requests? If you don't, you'll see this exact timeout (eventually).

I've just solved the problem.I found that the nodes in the App.config file have configed wrong.

<client>
<endpoint name="WCF_QtrwiseSalesService" binding="wsHttpBinding" bindingConfiguration="ws" address="http://cntgbs1131:9005/MyService/TGE.ISupplierClientManager" contract="*">
</endpoint>
</client>

<bindings>
    <wsHttpBinding>
        <binding name="ws" maxBufferPoolSize="2147483647" maxReceivedMessageSize="2147483647" messageEncoding="Text">
            <readerQuotas maxDepth="2147483647" maxStringContentLength="2147483647" maxArrayLength="2147483647" maxBytesPerRead="2147483647" maxNameTableCharCount="2147483647"/>
            <**security mode="None">**
                <transport clientCredentialType="None"></transport>
            </security>
        </binding>
    </wsHttpBinding>
</bindings>

Confirm your config in the node <security>,the attribute "mode" value is "None". If your value is "Transport",the error occurs.

May it help you this link in Msdn Blogs:

http://blogs.msdn.com/tess/archive/2009/01/09/net-hang-my-application-hangs-after-i-called-my-wcf-service-a-couple-of-times.aspx

Did you try using clientVia to see the message sent, using SOAP toolkit or something like that? This could help to see if the error is coming from the client itself or from somewhere else.

Did you check the WCF traces? WCF has a tendency to swallow exceptions and only return the last exception, which is the timeout that you're getting, since the end point didn't return anything meaningful.

You will also receive this error if you are passing an object back to the client that contains a property of type enum that is not set by default and that enum does not have a value that maps to 0. i.e enum MyEnum{ a=1, b=2};

Looks like this exception message is quite generic and can be received due to a variety of reasons. We ran into this while deploying the client on Windows 8.1 machines. Our WCF client runs inside of a windows service and continuously polls the WCF service. The windows service runs under a non-admin user. The issue was fixed by setting the clientCredentialType to "Windows" in the WCF configuration to allow the authentication to pass-through, as in the following:

      <security mode="None">
        <transport clientCredentialType="Windows" proxyCredentialType="None"
          realm="" />
        <message clientCredentialType="UserName" algorithmSuite="Default" />
      </security>

I'm not a WCF expert but I'm wondering if you aren't running into a DDOS protection on IIS. I know from experience that if you run a bunch of simultaneous connections from a single client to a server at some point the server stops responding to the calls as it suspects a DDOS attack. It will also hold the connections open until they time-out in order to slow the client down in his attacks.

Multiple connection coming from different machines/IP's should not be a problem however.

There's more info in this MSDN post:

http://msdn.microsoft.com/en-us/library/bb463275.aspx

Check out the MaxConcurrentSession sproperty.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow