Yesterday I encountered a very strange error and after a day I barely made any progress so I guess it's a good candidate for asking the community. I will ask for some patiecne cause I think it's a though one.
I have a C# Winforms app which hangs after a few clicks in production. The same never happens in development environment only in production. When the hang occures nothing really happens (no error messages, however the task goes to "not responding" state according to the task manager) but the GUI becomes irresponsive. I tried it on the same environment and I can confirm the behavior.
Unfortunatelly it is not possible to install the development tools and debug the application in prod env. The best I could do was to make memory dumps from the application when it stopped. The problem is that I totally don't understand what I see in the crash dump: my Main Thread (the GUI thread) seems to be stuck in an instruction for which I cannot find any reason.
Here is the stack trace of my main thread:
KERNELBASE.dll!_RaiseException@16() + 0x54 bytes
[External Code]
CFAPControlLibrary.dll!CFAPControlLibrary.Communication.Base.GetSetting(string settingName) Line 850 + 0x10 bytes C#
CFAPControlLibrary.dll!CFAPControlLibrary.ConfigHelper.Get<CFAPControlLibrary.DataTypes.ActionSortingOption>(string settingName) Line 25 + 0x35 bytes C#
CFAPControlLibrary.dll!CFAPControlLibrary.ConfigHelper.Get<CFAPControlLibrary.DataTypes.ActionSortingOption>(string settingName, CFAPControlLibrary.DataTypes.ActionSortingOption defaultVal) Line 15 + 0x9 bytes C# CFAPControlLibrary.dll!CFAPControlLibrary.DataTypes.ActionStorage.Sort(System.Collections.Generic.List<CFAPControlLibrary.DataTypes.ActionClass> subject) Line 167 + 0xe bytes C#
CFAPControlLibrary.dll!CFAPControlLibrary.DataTypes.ActionStorage.GetByStatus(string pStatus) Line 162 + 0x46 bytes C#
CFAPControlLibrary.dll!CFAPControlLibrary.ActionSelector.FillNodes() Line 48 + 0x26 bytes C#
CFAPControlLibrary.dll!CFAPControlLibrary.CFAPMain.OnActionDetailsArrived(CFAPControlLibrary.CFAPMain.RawActionDetails bwr) Line 371 + 0x10 bytes C#
CFAPControlLibrary.dll!CFAPControlLibrary.CFAPMain.OnGetDetailsCompleted(object sender, System.ComponentModel.RunWorkerCompletedEventArgs e) Line 337 + 0xb bytes C#
user32.dll!_InternalCallWinProc@20() + 0x23 bytes
user32.dll!_UserCallWinProcCheckWow@32() + 0xb3 bytes
user32.dll!_DispatchMessageWorker@8() + 0xe6 bytes
user32.dll!_DispatchMessageW@4() + 0xf bytes
[External Code]
CFAPHost.exe!CFAPHost.Program.Main(string[] args) Line 50 + 0x1d bytes C#
[External Code]
mscoreei.dll!__CorExeMain@0() + 0x38 bytes
mscoree.dll!_ShellShim__CorExeMain@0() + 0x227 bytes
mscoree.dll!__CorExeMain_Exported@0() + 0x8 bytes
kernel32.dll!@BaseThreadInitThunk@12() + 0x12 bytes
ntdll.dll!___RtlUserThreadStart@8() + 0x27 bytes
ntdll.dll!__RtlUserThreadStart@8() + 0x1b bytes
And here are my source code from the top stack frames:
The disassembly from KernelBase.dll:
Than the last frame from my code, m_SettingCache is a Dictionary and it does not contain the requested key:
The next couple of frames:
I think the code is pretty straightforward its just generic setting reading with default value. If something goes wrong (setting name is undefined or conversion is not possible) the default value will be returned. The code surely works. What I see from the dump is the read from the dictionary never returns although it should throw a KeyNotFoundException but that never happens. Any suggestions?
Note: the main thread is indeed stopped in the state captured by the dump: every time I make a dump the result is the same.
Note2: the hang never happens on the first execution of this code path, in every scenario this very same code path was executed before the hang (deduced from the app log)
I will provide more details on request.
Thanks in advance.
Edit:
CFAPControlLibrary.dll is the main assembly of the application. It contains the windows forms and their corresponding logic. Communication with the server is achived with WCF. And the bigger requests are made in a paralell thread using a BackgroundWorker. The execution path you see in the call stack is invoked by the completition event of such a BackgroundWorker.
I pasted the requested code bits here
My AppDomain.CurrentDomain.UnhandledException handler is here
The part of the stack wchich I considered irrevelant first but later proved to be important (sensitive string literals are deleted from the image):
This shows that Application.Run was called, I have no idea why it is not shown in the call stack.
Update
After spending three days without finding the cause of the problem I decided to try a workaround. Since the memory dumps showed that the application hangs always at the very same point: when a KeyNotFound exception should have been thrown. The most straightforward workaround was to refactor that code to not throw if possible. That version passed the tests and never hang.
This is not a solution at all but we couldn't spend anymore time on this. So basically I cross my fingers ship the code and hope I never see this crash again.
Thank you for all the suggestions