Question

Please help! I'm really at my wits' end. My program is a little personal notes manager (google for "cintanotes"). On some computers (and of course I own none of them) it crashes with an unhandled exception just after start. Nothing special about these computers could be said, except that they tend to have AMD CPUs.

Environment: Windows XP, Visual C++ 2005/2008, raw WinApi.

Here is what is certain about this "Heisenbug":

1) The crash happens only in the Release version.

2) The crash goes away as soon as I remove all GDI-related stuff.

3) BoundChecker has no complains.

4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?

Any ideas would be greatly appreciated!

UPDATE: I've managed to get the app debugged on a "faulty" PC. The results:

"Unhandled exception at 0x0044a26a in CintaNotes.exe: 0xC000001D: Illegal Instruction."

and code breaks on

0044A26A cvtsi2sd xmm1,dword ptr [esp+14h]

So it seems that the problem was in the "Code Generation/Enable Enhanced Instruction Set" compiler option. It was set to "/arch:SSE2" and was crashing on the machines that didn't support SSE2. I've set this option to "Not Set" and the bug is gone. Phew!

Thank you all very much for help!!

Was it helpful?

Solution

So it doesnnt crash when configuration is DEBUG Configuration? There are many things different than a RELEASE configruation: 1.) Initialization of globals 2.) Actual machine Code generated etc..

So first step is find out what are exact settings for each parameter in the RELEASE mode as compared to the DEBUG mode.

-AD

OTHER TIPS

4) Writig a log shows that the crash happen on a declaration of a local int variable! how could that be? Memory corruption?

What is the underlying code in the executable / assembly? Declaration of int is no code at all, and as such cannot crash. Do you initialize the int somehow?

To see the code where the crash happened you should perform what is called a postmortem analysis.

Windows Error Reporting

If you want to analyse the crash, you should get a crash dump. One option for this is to register for Windows Error Reporting - requires some money (you need a digital code signing ID) and some form filling. For more visit https://winqual.microsoft.com/ .

Get the crash dump intended for WER directly from the customer

Another option is to get in touch witch some user who is experiencing the crash and get a crash dump intended for WER from him directly. The user can do this when he clicks on the Technical details before sending the crash to Microsoft - the crash dump file location can be checked there.

Your own minidump

Another option is to register your own exception handler, handle the exception and write a minidump anywhere you wish. Detailed description can be found at Code Project Post-Mortem Debugging Your Application with Minidumps and Visual Studio .NET article.

1) The crash happens only in the Release version.

That's usually a sign that you're relying on some behaviour that's not guaranteed, but happens to be true in the debug build. For example, if you forget to initialize your variables, or access an array out of bounds. Make sure you've turned on all the compiler checks (/RTCsuc). Also check things like relying on the order of evaluation of function parameters (which isn't guaranteed).

2) The crash goes away as soon as I remove all GDI-related stuff.

Maybe that's a hint that you're doing something wrong with the GDI related stuff? Are you using HANDLEs after they've been freed, for example?

Download the Debugging tools for Windows package. Set the symbol paths correctly, then run your application under WinDbg. At some point, it will break with an Access Violation. Then you should run the command "!analyze -v", which is quite smart and should give you a hint on whats going wrong.

Most heisenbugs / release-only bugs are due to either flow of control that depends on reads from uninitialised memory / stale pointers / past end of buffers, or race conditions, or both.

Try overriding your allocators so they zero out memory when allocating. Does the problem go away (or become more reproducible?)

Writig a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?

Stack overflow! ;)

4) Writig a log shows that the crash happen on a declaration of a local int variable!how could that be? Memory corruption

I've found the cause to numerous "strange crashes" to be dereferencing of a broken this inside a member function of said object.

What does the crash say ? Access violation ? Exception ? That would be the further clue to solve this with

Ensure you have no preceeding memory corruptions using PageHeap.exe

Ensure you have no stack overflow (CBig array[1000000])

Ensure that you have no un-initialized memory.

Further you can run the release version also inside the debugger, once you generate debug symbols (not the same as creating debug version) for the process. Step through and see if you are getting any warnings in the debugger trace window.

"4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?"

This could be a sign that the hardware is in fact faulty or being pushed too hard. Find out if they've overclocked their computer.

When I get this type of thing, i try running the code through gimpels PC-Lint (static code analysis) as it checks different classes of errors to BoundsChecker. If you are using Boundschecker, turn on the memory poisoning options.

You mention AMD CPUs. Have you investigated whether there is a similar graphics card / driver version and / or configuration in place on the machines that crash? Does it always crash on these machines or just occasionally? Maybe run the System Information tool on these machines and see what they have in common,

Sounds like stack corruption to me. My favorite tool to track those down is IDA Pro. Of course you don't have that access to the user's machine.

Some memory checkers have a hard time catching stack corruption ( if it indeed that ). The surest way to get those I think is runtime analysis.

This can also be due to corruption in an exception path, even if the exception was handled. Do you debug with 'catch first-chance exceptions' turned on? You should as long as you can. It does get annoying after a while in many cases.

Can you send those users a checked version of your application? Check out Minidump Handle that exception and write out a dump. Then use WinDbg to debug on your end.

Another method is writing very detailed logs. Create a "Log every single action" option, and ask the user to turn that on and send it too you. Dump out memory to the logs. Check out '_CrtDbgReport()' on MSDN.

Good Luck!

EDIT:

Responding to your comment: An error on a local variable declaration is not surprising to me. I've seen this a lot. It's usually due to a corrupted stack.

Some variable on the stack may be running over it's boundaries for example. All hell breaks loose after that. Then stack variable declarations throw random memory errors, virtual tables get corrupted, etc.

Anytime I've seen those for a prolong period of time, I've had to go to IDA Pro. Detailed runtime disassembly debugging is the only thing I know that really gets those reliably.

Many developers use WinDbg for this kind of analysis. That's why I also suggested Minidump.

Try Rational (IBM) PurifyPlus. It catches a lot of errors that BoundsChecker doesn't.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top