Pergunta

We are using the now deprecated Windows Azure Accelerator to deploy multiple applications to a Windows Azure web role. We have noticed a massive memory leak in the WAIISHost.exe process - it is currently consuming 2.5GB of RAM (on a Large Azure instance). One week ago, it was at 1.5GB, so it's safe to say it leaks a gig a week.

We've looked at the memory dump and it appears that the leak is unmanaged - using SOS in WinDBG revealed no more than 50MB of managed heap.

We've used the heap_stat.py WinDBG extension and it revealed that most of the allocated objects come from nativerd dll (which I believe is an internal infrastructure library). Here is what !py heap_stat.py -stat revealed:

Statistics:

                                     Type name         Count  Size
                     nativerd!SCHEMA_ATTRIBUTE       8127384  Unknown
                      nativerd!ATTRIBUTE_VALUE       8127037  Unknown
                       nativerd!SCHEMA_ELEMENT       2032263  Unknown
                       nativerd!CONFIG_ELEMENT       1112616  Unknown
                      nativerd!NAMED_ENTRY_KEY         99967  Unknown
                      nativerd!DICTIONARY_LIST         54152  Unknown
                      nativerd!DUPLICATE_TABLE         11654  Unknown

Running !heap -p -a on any of those objects did not reveal much additional information:

0:000> !heap -p -a 000000002c1591e0

address 000000002c1591e0 found in
_HEAP @ 8d0000
          HEAP_ENTRY Size Prev Flags            UserPtr UserSize - state
    000000002c1591e0 0014 0000  [00]   000000002c1591f0    00130 - (busy)
      nativerd!SCHEMA_ELEMENT::`vftable'

At this point, we are wondering what could the next steps investigating the memleak be. Is there any other useful information that can be extracted from the memory dump, or should we resort to other means such as inspecting the code and trying to run locally with a profiler?

Update: Our VMs are running Windows Server 2008 R2 SP1. We are using Azure SDK 1.7. Finally, the version of nativerd.dll is 7.5.7601.17855

Foi útil?

Solução

I've taken the deprecated Windows Azure Accelerator for Web Roles and giving it much needed love. It's been upgraded to fix the problem you have indicated here as well as upgrade to Windows Server 2012 (currently it's against 1.8 SDK, but if you know what you are doing it should work fine with 2.X).

You can check it out here: https://github.com/MRCollective/AzureWebFarm

The immediate fix for the problem you are experiencing is shown in this commit: https://github.com/MRCollective/AzureWebFarm/commit/467516c77fa23b23fa94f98deb38679cfd08663a, alternatively, if you upgrade to Windows Server 2012 then the problem no longer exists.

Another option that we recently produced is here: https://github.com/MRCollective/AzureWebFarm.OctopusDeploy

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top