Вопрос

The following code base worked for years on Windows Server, Windows XP and became unreliable on Windows 7.

Two COM+ components, caller and callee, share the same transaction context and participate in indirect recursion. The caller initially calls the callee multiple times, and the callee always eventually calls SetComplete or SetAbort before returning (regardless of whether this is a Caller->Callee invocation, or maybe Caller->Callee->Caller->Callee).

What I see is that the first SQL query inside the second invocation of the callee errors out with:

Distributed transaction completed. Either enlist this session in a new transaction or the NULL transaction

The transactions are, in some configurations, very slow and can easily exceed 60 seconds of duration.

The error looks like a transaction timeout, or maybe component deactivation following the suspect SetComplete which is setting the done bit. However, it was not the root component of the transaction and this documentation specifically says "However, unless the object calling SetAbort is the root of the transaction, the transaction continues to run even though nothing can save it from eventually aborting." This documentation says "A transaction is neither committed nor aborted until the root object of the transaction deactivates." so I am not sure whether the SetComplete could be contributing to the error somehow.

Transaction timeout is set to 1800 seconds at component level for all components at install time, the global transaction timeout is set to 60 seconds. Components use mostly ADO (including disconnected result sets) or .NET SqlClient for database access.

I don't understand several aspects of what I am seeing.

  1. If multiple instances of the same component are active at one time in the same transaction, could a SetComplete cause early deactivation of any instance other than the one calling SetComplete, or affect the rules of application of transaction timeouts?
  2. Does Windows 7 deactivate components differently (faster) than the other few OSes? Is there any configuration that controls this behavior.
  3. Is it possible that Windows 7 ignores the component level transaction timeout? There have been such defects long ago.
Это было полезно?

Решение

I will be happy to award the bounty to anyone who will provide any good insights on the questions asked. However, our initial issue was tracked down to this.

We found one particular very long running transaction that was almost guaranteed to take slightly over 60 seconds (due to unfortunate synchronization and retry logic). Its transaction context, unlike the rest of the code base, was created manually and was inheriting the global transaction timeout on Windows 7, even though executing within components all of which would have declarative transaction times set to 1800 seconds. Why this still worked on other operating systems is not very clear, but of course we are getting rid of this long running transaction scenario as a whole.

This is what "created manually" means above.

_COM_SMARTPTR_TYPEDEF(ITransactionContextEx, __uuidof(ITransactionContextEx));
ITransactionContextExPtr pTransContext;
TESTHR(pTransContext.CreateInstance(CLSID_TransactionContextEx));

ISomeSvcsPtr pSomeSvcs;

pTransContext->CreateInstance(__uuidof(PSSomeSvcs), 
                              ISomeSvcsPtr::GetIID(), 
                              reinterpret_cast<void **>(&pSomeSvcs));
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top