Proxy APIs timeout on API Key due to Cassandra connection issue

https://stackoverflow.com/questions/22811072

apigee

26-06-2023
|

Pregunta

I'd appreciate help triaging and solving this:

I'm getting frequent periods where all the Proxy APIs hang and the trace shows "???" for the HTTP status code for requests and I get this response after 30 seconds:

 Status Code: 504 Gateway Timeout 
 Content-Length: 177 
 Content-Type: text/xml; charset=UTF-8 

<?xml version='1.0' encoding='UTF-8'?><fault><faultstring>The Service is temporarily unavailable</faultstring><detail><errorcode>SERVICE_UNAVAILABLE</errorcode></detail></fault>

Here's what I see in the system.log for all three Cassandra servers

>     2014-04-01 14:29:20,124 org: env: Apigee-Main-36 ERROR m.p.c.c.c.HThriftClient - HThriftClient.close() : Could not flush
> transport (to be expected if the pool is shutting down) in close for
> client: CassandraClient<10.49.192.52:9160-829>
>     org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
>       at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
> ~[libthrift-0.7.0.jar:0.7.0]
>       at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
> ~[libthrift-0.7.0.jar:0.7.0]
>       at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:125)
> [hector-core-1.1-3.jar:na]
>       at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:38)
> [hector-core-1.1-3.jar:na]
>       at me.prettyprint.cassandra.connection.HConnectionManager.closeClient(HConnectionManager.java:325)
> [hector-core-1.1-3.jar:na]
>       at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:273)
> [hector-core-1.1-3.jar:na]
>       at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
> [hector-core-1.1-3.jar:na]
>       at me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate.sliceInternal(ThriftColumnFamilyTemplate.java:88)
> [hector-core-1.1-3.jar:na]
>       at me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate.doExecuteSlice(ThriftColumnFamilyTemplate.java:46)
> [hector-core-1.1-3.jar:na]
>       at me.prettyprint.cassandra.service.template.ColumnFamilyTemplate.queryColumns(ColumnFamilyTemplate.java:113)
> [hector-core-1.1-3.jar:na]
>       at com.apigee.datastore.client.CassandraClient.get(CassandraClient.java:169)
> [datastore-1.0.0.jar:na]
>       at com.apigee.keymanagement.dao.nosql.impl.AppDaoImpl.getCredential(AppDaoImpl.java:123)
> [keymanagement-1.0.0.jar:na]
>       at com.apigee.keymanagement.dao.nosql.impl.AppDaoImpl.getConsumerKeyStatus(AppDaoImpl.java:77)
> [keymanagement-1.0.0.jar:na]
>       at com.apigee.keymanagement.util.ResourceUtil.validateConsumerKey(ResourceUtil.java:490)
> [keymanagement-1.0.0.jar:na]
>       at com.apigee.keymanagement.util.ResourceUtil.validateConsumerKey(ResourceUtil.java:475)
> [keymanagement-1.0.0.jar:na]
>       at com.apigee.keymanagement.util.ResourceUtil.getConsumerDetails(ResourceUtil.java:526)
> [keymanagement-1.0.0.jar:na]
>       at com.apigee.keymanagement.util.ResourceUtil.getConsumerDetailsForApiKey(ResourceUtil.java:596)
> [keymanagement-1.0.0.jar:na]
>       at com.apigee.keymanagement.service.OAuth2RuntimeServiceImpl.getConsumerForApiKey(OAuth2RuntimeServiceImpl.java:81)
> [keymanagement-1.0.0.jar:na]
>       at com.apigee.oauth.v2.connectors.LocalOAuthServiceConnector.getClientAttributesForApiKey(LocalOAuthServiceConnector.java:173)
> [oauthV2-1.0.0.jar:na]
>       at com.apigee.oauth.v2.OAuthServiceImpl.getClientAttributesForApiKey(OAuthServiceImpl.java:506)
> [oauthV2-1.0.0.jar:na]
>       at com.apigee.steps.oauth.v2.OAuthStepExecution.execute(OAuthStepExecution.java:401)
> [oauthV2-1.0.0.jar:na]
>       at com.apigee.messaging.runtime.steps.StepExecution.execute(StepExecution.java:97)
> [message-processor-1.0.0.jar:na]
>       at com.apigee.flow.execution.AsyncExecutionStrategy$AsyncExecutionTask.call(AsyncExecutionStrategy.java:69)
> [message-flow-1.0.0.jar:na]
>       at com.apigee.flow.execution.AsyncExecutionStrategy$AsyncExecutionTask.call(AsyncExecutionStrategy.java:51)
> [message-flow-1.0.0.jar:na]
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [na:1.6.0_32]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_32]
>       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> [na:1.6.0_32]
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [na:1.6.0_32]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_32]
>       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> [na:1.6.0_32]
>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> [na:1.6.0_32]
>       at java.lang.Thread.run(Thread.java:662) [na:1.6.0_32]
>     Caused by: java.net.SocketException: Broken pipe
>       at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.6.0_32]
>       at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> ~[na:1.6.0_32]
>       at java.net.SocketOutputStream.write(SocketOutputStream.java:136) ~[na:1.6.0_32]
>       at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
> ~[libthrift-0.7.0.jar:0.7.0]
>       ... 31 common frames omitted
>     2014-04-01 14:29:20,126 org: env: Apigee-Main-36 ERROR m.p.c.c.HConnectionManager - HConnectionManager.markHostAsDown() :
> MARK HOST AS DOWN TRIGGERED for host 10.49.192.52(10.49.192.52):9160
>     2014-04-01 14:29:20,126 org: env: Apigee-Main-36 ERROR m.p.c.c.HConnectionManager - HConnectionManager.markHostAsDown() :
> Pool state on shutdown:
> <ConcurrentCassandraClientPoolByHost>:{10.49.192.52(10.49.192.52):9160};
> IsActive?: true; Active: 1; Blocked: 0; Idle: 2; NumBeforeExhausted: 9
>     2014-04-01 14:29:20,127 org: env: Apigee-Main-36 ERROR m.p.c.c.c.HThriftClient - HThriftClient.close() : Could not flush
> transport (to be expected if the pool is shutting down) in close for
> client: CassandraClient<10.49.192.52:9160-828>
>     org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection timed out
>       at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
> ~[libthrift-0.7.0.jar:0.7.0]
>       at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
> ~[libthrift-0.7.0.jar:0.7.0]
>       at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:125)
> [hector-core-1.1-3.jar:na]

Solución 2

Since I'm a paying Apigee customer, I also opened a case....

originally, they weren't sure there was a keep alive function or a connection TTL that would force a drop/re-establish a connection

Here's what I got back:

Set the following on your Router/Message Processor nodes: /proc/sys/net/ipv4/tcp_keepalive_time to 1800 second (30 minutes).

To do this: echo 1800 > /proc/sys/net/ipv4/tcp_keepalive_time

BE AWARE: This change is not persisted across reboots so you would like to edit your /etc/sysctl.conf file and put this in there.

Then do the command :

sysctl -p

to make those values load from that file.

You can use the following to check if the value got updated

sysctl net.ipv4.tcp_keepalive_time

restart your Message Processors.

So the fix that has been put in place was a keep alive probe in the Hector client in the message processor.

The probe does a keep alive ping based on the interval set in the tcp_keepalive_time OS setting. So, the reasoning to set this to 30 minutes is based on your firewall setting for the idle time out being 3600 seconds.

The keep alive probes need to happen faster than the firewall's idle timeout so that it keeps the connection in the established state.

Otros consejos

This can happen when there is load on cassandra and it will mark itself down.

During this process it will close its connections in the pool.

Please restart the Message processors to re-establish the connection.

Let me know how it goes.

Regards, Jagjyot.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow