Question

I have following basic questions :

  • When we should involve disassembly in debugging

  • How to interpret disassembly, For example below what does each segment stands for

00637CE3 8B 55 08             mov         edx,dword ptr [arItem]
00637CE6 52                   push        edx
00637CE7 6A 00                push        0
00637CE9 8B 45 EC             mov         eax,dword ptr [result]
00637CEC 50                   push        eax
00637CED E8 3E E3 FF FF       call        getRequiredFields (00636030)
00637CF2 83 C4 0C             add 

Language : C++

Platform : Windows

Was it helpful?

Solution

It's quite useful to estimate how efficient is the code emitted by the compiler.

For example, if you use an std::vector::operator[] in a loop without disassembly it's quite hard to guess that each call to operator[] in fact requires two memory accesses but using an iterator for the same would require one memory access.

In your example:

mov         edx,dword ptr [arItem] // value stored at address "arItem" is loaded onto the register
push        edx // that register is pushes into stack
push        0 // zero is pushed into stack
mov         eax,dword ptr [result] // value stored at "result" address us loaded onto the register
push        eax // that register is pushed into stack
call        getRequiredFields (00636030) // getRequiredFields function is called

this is a typical sequence for calling a function - paramaters are pushed into stack and then the control is transferred to that function code (call instruction).

Also using disassembly is quite useful when participating in arguments about "how it works after compilation" - like caf points in his answer to this question.

OTHER TIPS

1 - We should (I) involve disassembly in debugging as a last resort. Generally, an optimizing compiler generates code that is not trivial to understand to the human eye. Instructions are reordered, some dead code is eliminated, some specific code is inlined, etc, etc. So it is not necessary and not easy when necessary to understand disassembled code. For example, I sometimes look at the disassembly to see if constants are part of the opcode or are stored in const variables.

2 - That piece of code calls a function like getRequiredFields(result, 0, arItem). You have to learn assembly language for the processor you want. For x86, go to www.intel.com and get the manuals of the IA32.

When you should involve disassembly: When you exactly want to know what the CPU is doing when it's executing your program, or when you don't have the source code in whatever higher level language the program was written in (C++ in your case).

How to interpret assembly code: Learn assembly language. You can find an exhaustive reference on Intel x86 CPU instructions in Intel's processor manuals.

The piece of code that you posted prepares arguments for a function call (by getting and pushing some values on the stack and putting a value in the register eax), and then calls the function getRequiredFields.

I started out in 1982 with assembly debugging of PL/M programs on CP/M-80 and later Digital Research OSes. It was the same in the early days of MS-DOS until Microsoft introduced symdeb which was a command-line debugger where source and assembly were displayed simultaneously. Symdeb was a leap forward but not that great since the earlier debuggers had forced me to learn to recognize what assembly code belonged to which source code line. Before CodeView the best debugger was pfix86 from Phoenix Technologies. NuMegas SoftIce was the best debugger (apart from pure hardware ICEs) I've ever come across in that it not only debugged my application but effortlessly led me through the inner workings of Windows as well. But I digress.

Late in 1990 a consultant in a project I was working in approached me and said he had this (very early) C++ bug he'd been working on for days but couldn't understand what the problem was. He single-stepped through the source code (on a windowed non-graphic DOS debugger) for me while I got all impatient. Finally I interrupted him and looked through the debugger options and sure enough there was the mixed source/assembly mode with registers and everything. This made it easy to realize that the application was trying to free an internal pointer (for local variables) containing NULL. For this problem, the source code mode was of no help at all. Today's C++ compilers will probably no longer contain a bug such as this but there will be others.

Knowing assembly-level debugging allows you to understand the source-compiler-assembly relationship to the extent of being able to predict what code the compiler will generate. Many people here on stackoverflow say "profile-profile-profile" but this goes a step further in that you learn what source-code constructs (I write in C) to use when and which to avoid. I suspect this is even more important with C++ which can generate a lot of code without the developer suspecting anything. For example there is a standard class for handling lists of objects which appears to be without drawbacks - just a few lines of code and this fantastic functionality! - until you look at the scores of strange procedure calls it generates. I'm not saying it's wrong to use them, I'm just saying that the developer should be aware of the pros and cons of using them. Overloading operators may be great functionality (somewhat weird to a WYSIWYG programmer like me) but what is the price in execution speed? If you say "nothing" I say "prove it."

It is never wrong to use mixed or pure assembly mode when debugging. Difficult bugs will usually be easier to find and the developer will learn to write more efficient code. Developers from the interpreted camp (C# and Java) will say that their code is just as efficient as the compiled languages but if you know assembly you will also know why they are wrong, why they are dead wrong. You can smile and think "yeah, tell me about it!"

After you've worked with different compilers you will come across one with the most astonishing code-generation ability. One PowerPC compiler condensed three nested loops into one loop simply through the superior code interpretation of it's optimizer. Next to the guy who wrote that I'm ... well, let's just say in a different league.

Up until about ten years ago I wrote quite a bit of pure assembly but with multi-stage pipelines, multiple execution units and now multiple cores to contend with the C compiler beats me hands down. On the other hand I know what the compiler can do a good job with and what it shouldn't have to work with: Garbage In still equals Garbage Out. This is true for any compiler that produces assembly output.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top