Question

#include <iostream>
#include <vector>
using namespace std;

int main()
{
    vector< vector<int> > dp(50000, vector<int>(4, -1));
    cout << dp.size();
}

This tiny program takes a split second to execute when simply run from the command line. But when run in a debugger, it takes over 8 seconds. Pausing the debugger reveals that it is in the middle of destroying all those vectors. WTF?

Note - Visual Studio 2008 SP1, Core 2 Duo 6700 CPU with 2GB of RAM.

Added: To clarify, no, I'm not confusing Debug and Release builds. These results are on one and the same .exe, without even any recompiling inbetween. In fact, switching between Debug and Release builds changes nothing.

Was it helpful?

Solution

Running in the debugger changes the memory allocation library used to one that does a lot more checking. A program that does nothing but memory allocation and de-allocation is going to suffer much more than a "normal" program.

Edit Having just tried running your program under VS I get a call stack that looks like

ntdll.dll!_RtlpValidateHeapEntry@12()  + 0x117 bytes    
ntdll.dll!_RtlDebugFreeHeap@12()  + 0x97 bytes  
ntdll.dll!_RtlFreeHeapSlowly@12()  + 0x228bf bytes  
ntdll.dll!_RtlFreeHeap@12()  + 0x17646 bytes    
msvcr90d.dll!_free_base(void * pBlock=0x0061f6e8)  Line 109 + 0x13 bytes
msvcr90d.dll!_free_dbg_nolock(void * pUserData=0x0061f708, int nBlockUse=1)
msvcr90d.dll!_free_dbg(void * pUserData=0x0061f708, int nBlockUse=1) 
msvcr90d.dll!operator delete(void * pUserData=0x0061f708)
desc.exe!std::allocator<int>::deallocate(int * _Ptr=0x0061f708, unsigned int __formal=4)
desc.exe!std::vector<int,std::allocator<int> >::_Tidy()  Line 1134  C++

Which shows the debug functions in ntdll.dll and the C runtime being used.

OTHER TIPS

Running a program with the debugger attached is always slower than without.

This must be caused by VS hooking into the new/delete calls and doing more checking when attached - or the runtime library uses IsDebuggerPresent API and does things different in that case.

You can easily try this from inside Visual Studio, start the program with Debug->Start Debugging or Debug->Start Without Debugging. Without debugging is like from command line, with exactly the same build configuration and executable.

The debug heap automatically gets enabled when you start your program in the debugger, as opposed to attaching to an already-running program with the debugger.

The book Advanced Windows Debugging by Mario Hewardt and Daniel Pravat has some decent information about the Windows heap, and it turns out that the chapter on heaps is up on the web site as a sample chapter.

Page 281 has a sidebar about "Attaching Versus Starting the Process Under the Debugger":

When starting the process under the debugger, the heap manager modifies all requests to create new heaps and change the heap creation flags to enable debug-friendly heaps (unless the _NO_DEBUG_HEAP environment variable is set to 1). In comparison, attaching to an already-running process, the heaps in the process have already been created using default heap creation flags and will not have the debug-friendly flags set (unless explicitly set by the application).

(Also: a semi-related question, where I posted part of this answer before.)

It's definitely HeapFree that's slowing this down, you can get the same effect with the program below.

Passing parameters like HEAP_NO_SERIALIZE to HeapFree doesn't help either.

#include "stdafx.h"
#include <iostream>
#include <windows.h>

using namespace std;


int _tmain(int argc, _TCHAR* argv[])
{
HANDLE heap = HeapCreate(0, 0, 0);

void** pointers = new void*[50000];

int i = 0;
for (i = 0; i < 50000; ++i)
{
    pointers[i] = HeapAlloc(heap, 0, 4 * sizeof(int));
}

cout << i;
for (i = 49999; i >= 0; --i)
{
    HeapFree(heap, 0, pointers[i]);
}

cout << "!";

delete [] pointers;

HeapDestroy(heap);
}

http://www.symantec.com/connect/articles/windows-anti-debug-reference

read sections 2 "PEB!NtGlobalFlags" and 2 "Heap flags"

think this may explain it ...


EDIT: added solution

in your handler for CREATE_PROCESS_DEBUG_EVENT, add the following

// hack 'Load Configuration Directory' in exe header to point to a new block that specfies GlobalFlags 
IMAGE_DOS_HEADER dos_header;
ReadProcessMemory(cpdi.hProcess,cpdi.lpBaseOfImage,&dos_header,sizeof(IMAGE_DOS_HEADER),NULL);
IMAGE_OPTIONAL_HEADER32 pe_header;
ReadProcessMemory(cpdi.hProcess,(BYTE*)cpdi.lpBaseOfImage+dos_header.e_lfanew+4+sizeof(IMAGE_FILE_HEADER),&pe_header,offsetof(IMAGE_OPTIONAL_HEADER32,DataDirectory),NULL);
IMAGE_LOAD_CONFIG_DIRECTORY32 ilcd;
ZeroMemory(&ilcd,sizeof(ilcd));
ilcd.Size = 64; // not sizeof(ilcd), as 2000/XP didn't have SEHandler
ilcd.GlobalFlagsClear = 0xffffffff; // clear all flags.  this is as we don't want dbg heap
BYTE *p = (BYTE *)VirtualAllocEx(cpdi.hProcess,NULL,ilcd.Size,MEM_COMMIT|MEM_RESERVE,PAGE_READWRITE);
WriteProcessMemory(cpdi.hProcess,p,&ilcd,ilcd.Size,NULL);
BYTE *dde = (BYTE*)cpdi.lpBaseOfImage+dos_header.e_lfanew+4+sizeof(IMAGE_FILE_HEADER)+offsetof(IMAGE_OPTIONAL_HEADER32,DataDirectory)+sizeof(IMAGE_DATA_DIRECTORY)*IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG;
IMAGE_DATA_DIRECTORY temp;
temp.VirtualAddress = p-cpdi.lpBaseOfImage;
temp.Size = ilcd.Size;
DWORD oldprotect;
VirtualProtectEx(cpdi.hProcess,dde,sizeof(temp),PAGE_READWRITE,&oldprotect);
WriteProcessMemory(cpdi.hProcess,dde,&temp,sizeof(temp),NULL);
VirtualProtectEx(cpdi.hProcess,dde,sizeof(temp),oldprotect,&oldprotect);

Yeah, WTF indeed.

You know your compiler will optimize a lot of those function calls by inlining them, and then further optimize the code there to exclude anything that isn't actually doing anything, which in the case of vectors of int will mean: pretty much not a lot.

In debug mode, inlining is not turned on because that would make debugging awful.

This is a nice example of how fast C++ code can really be.

8 seconds?? I tried the same in Debug mode. Not more than half a second I guess. Are you sure it's the destructors?

FYI. Visual Studio 2008 SP1, Core 2 Duo 6700 CPU with 2GB of RAM.

makes no sense to me - attaching a debugger to a random binary in a normal configuration should mostly just trap breakpoint interrupts (asm int 3, etc).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top