문제

We use AWS's DynamoDB Session Provider in our app to store session data.

I recently moved to an environment where I can have NewRelic monitoring my app and it started throwing alerts regarding Dynamo DB access. However, NewRelic is the only monitoring tool that is getting it. I cannot see anything related to this problem in my application logging (log4net) or the Windows event viewer.

I searched a lot and even went through the source code of the provider but came out empty.

I'm getting (400) Bad Request from what is seems to be all the calls made during a period of 1 or 2 minutes at a time happening 3 or 4 times per hour.

The stacktrace I could get is not promising:

at System.Net.HttpWebRequest.GetResponse()
at System.Net.HttpWebRequest.GetResponse()
at Amazon.Runtime.AmazonWebServiceClient.getResponseCallback(IAsyncResult result)

And the offending URL is:

dynamodb.us-east-1.amazonaws.com/Stream/GetResponse

From the time-graphs below we can see that all requests are fine during most of time (graph 1), but when the problem occurs the number of successful requests made to DynamoDB goes to 0 (graph 1). And, at the same time, there is a spike in the number of errors thrown (graph 2).


UPDATE: During a low usage period in the weekend I ran Fiddler on the production server too see what the error from AWS looks like. I'm getting "The conditional request failed" which seems to happen because the value was updated while requesting and old value and therefore the value is not consistent to what was expected. Below is a full request/response as a sample.

Request:


POST https://dynamodb.us-east-1.amazonaws.com/ HTTP/1.1
X-Amz-Target: DynamoDB_20120810.UpdateItem
Content-Type: application/x-amz-json-1.0
User-Agent: aws-sdk-dotnet-35/2.0.15.0 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 SessionStateProvider TableSync
Host: dynamodb.us-east-1.amazonaws.com
X-Amz-Date: 20140510T153947Z
X-Amz-Content-SHA256: e7a4886acac6ccf16f0da9be962d3a68bd50e381c202277033d0d2bb3208aa8a
Authorization: AWS4-HMAC-SHA256 Credential=redacted/20140510/us-east-1/dynamodb/aws4_request, SignedHeaders=content-type;host;user-agent;x-amz-content-sha256;x-amz-date;x-amz-target, Signature=redacted
Accept: application/json
X-NewRelic-ID: redacted
X-NewRelic-Transaction: redacted
Content-Length: 399

{
    "TableName": "ASP.NET_SessionState",
    "Key": {
        "SessionId": {
            "S": "redacted"
        }
    },
    "AttributeUpdates": {
        "LockId": {
            "Value": {
                "S": "42a9ed29-7a92-4455-8733-2f56c7d974b3"
            },
            "Action": "PUT"
        },
        "Locked": {
            "Value": {
                "N": "1"
            },
            "Action": "PUT"
        },
        "LockDate": {
            "Value": {
                "S": "2014-05-10T15:39:47.324Z"
            },
            "Action": "PUT"
        }
    },
    "Expected": {
        "Locked": {
            "Value": {
                "N": "0"
            },
            "Exists": true
        }
    },
    "ReturnValues": "ALL_NEW"
}

Response:


HTTP/1.1 400 Bad Request
x-amzn-RequestId: redacted
x-amz-crc32: redacted
Content-Type: application/x-amz-json-1.0
Content-Length: 120
Date: Sat, 10 May 2014 15:33:17 GMT

{
    "__type": "com.amazonaws.dynamodb.v20120810#ConditionalCheckFailedException",
    "message": "The conditional request failed"
}

Graph 1

Non Web Transactions - Graph 1

Graph 2

Errors - Graph 2

Any help is appreciated. Thanks!

도움이 되었습니까?

해결책

The conditional lock failure can occur if your application makes multiple requests at the same time which access the session state. This can be common with Ajax calls. The article The Downsides of ASP.NET Session State provides a good explanation about how ASP.NET serializes access to a particular session state with some work arounds:

The first issue we'll look at is one that a lot developers don't know about; by default the ASP.NET pipeline will not process requests belonging to the same session concurrently. It serialises them, i.e. it queues them in the order that they were received so that they are processed serially rather than in parallel. [...]

These errors should not be bubbling up to application level. The AWS SDK for .NET throws exceptions for conditional update failures which the session provider is interpreting that as failure to get the lock. That is passed back to the ASP.NET framework which queues the request till it can get the lock:

[...] This means that if a request is in progress and another request from the same session arrives, it will be queued to only begin executing when the first request has finished. Why does ASP.NET do this? For concurrency control, so that multiple requests (i.e. multiple threads) do not read and write to session state in an inconsistent way.

다른 팁

Update

Norm Johanson's answer surfaces the root cause of the issue at hand, I'm keeping my respectively adjusted answer for the parts that still apply and the pointers to related issues.


Initial Answer

I haven't faced the exact issue you describe, but it rings a bell regarding similar patterns encountered in the context of investigating the AWS API's Eventual Consistency, see e.g. my answer to Deterministically creating and tagging EC2 instances for more on this. Things have considerably improved since then:

Now, what I suspect is something like this:

  • New Relic is instrumenting the .NET byte code, which allows them to e.g. log all exceptions, regardless of whether they are handled or not.
  • Your client is e.g. getting throttled for request limit violations, which is causing a retryable 400 - ThrottlingException as per the API Error Codes, i.e. it triggers an exception that is handled and kicking off the exponential retry in turn, ultimately succeeding the request eventually, and leaving no trace for other tools accordingly.
    • Update: the exceptions at hand turn out to be the non retryable 400 - ConditionalCheckFailedException, thus this suspicion doesn't apply here.

In case, the question obviously is what might be causing this - even though the issue description doesn't match yours, the discussion in Performance issue in 2.0.12.0 hints on an ongoing threading issue in the 2.0.x releases of the .NET SDK, which might surface differently depending on the usage pattern at hand?

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top