How do I debug or fix the endless loop and heap corruption issue involving boost::interprocess managed_shared_memory?

StackOverflow https://stackoverflow.com/questions/16651878

Question

I have the following "first-chance exception" message which is coming from a DLL I wrote which is running inside an executable that I did not write. That is, the DLL is a plugin. The first time this exception fires, an attempt to open a shared memory map file is failing. If I ignore first chance exceptions and just run, the application freezes or crashes eventually.

First-chance exception at 0x76a7c41f in notmyexe.exe: Microsoft C++ exception: boost::interprocess::interprocess_exception at memory location 0x002bc644..

After several hours it appears to be caused by a block of code which is looping endlessly until an expected exception condition clears. It turns out that if it never does clear, and then, eventually, this exception turns into another low-level-exception-condition and/or turns into heap corruption. All of this is just in an effort to open a shared memory area using Boost::interprocess.

The first thing that complicates things is that on my Visual C++ 2008 based project, the first boost::interprocess::interprocess_exception first-chance exception is not thrown and identified as the location where it came from because the Visual C++ 2008 compiler cannot find the complex boost-flavor templates code in question. However by single stepping through the assembly language view, I found the code that blows up.

The top level line of my own code that it all starts to go bad on is:

  segment = new managed_shared_memory(   open_or_create
                                      ,  MEMORY_AREA_NAME
                                      , SHARED_AREA_SIZE );          

The above managed_shared_memory class is from interprocess_fwd.hpp, and is a standard part of the boost shared memory API/headers. Because it's template based, the above expands into about a 2Kchars long C++ boost template expression, which is truncated at different lengths by the linker, and by the debugger. Visual C++ 2008 has no more source code debugging capabilities, it seems when these limits are in play.

For example, when it blows up I get this call stack:

    KernelBase.dll!76a7c41f()   
    [Frames below may be incorrect and/or missing, no symbols loaded for KernelBase.dll]    
    KernelBase.dll!76a7c41f()   
>   msvcr90d.dll!_malloc_dbg(unsigned int nSize=2290875461, int nBlockUse=264, const char * szFileName=0x01fcb983, int nLine=1962999808)  Line 160 + 0x1b bytes C++
    8bfc4d89()  

No actual end-user written source functions appear in the stack dump above.

How should I debug this? Secondly, is there some known problem with boost-interprocess, with Visual C++ 2008? Third, what is the boost code below doing and why must it endlessly loop?

boost::interprocess::basic_managed_shared_memory<char,
   boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,
        boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,
        boost::interprocess::iset_index>::basic_managed_shared_memory<char,boo...

Further layers down, we get to:

basic_managed_shared_memory (open_or_create_t,
                              const char *name, size_type size,
                              const void *addr = 0, const permissions& perm = permissions())
      : base_t()
      , base2_t(open_or_create, name, size, read_write, addr,
                create_open_func_t(get_this_pointer(),
                ipcdetail::DoOpenOrCreate), perm)
   {}  

Anyways, don't try to debug this at home kids, here's what happens:

enter image description here

Finally, using my ninja-like ability to single step through several million lines of assembly language I have defeated Visual C++ 2008's evil debugger limitations, and have found the code in question.

This is what is blowing up in fact: create_device<FileBased>(dev....

Some context here: managed_open_or_create_impl.h line 351...

else if(type == DoOpenOrCreate){
         //This loop is very ugly, but brute force is sometimes better
         //than diplomacy. If someone knows how to open or create a
         //file and know if we have really created it or just open it
         //drop me a e-mail!
         bool completed = false;
         while(!completed){
            try{
               create_device<FileBased>(dev, id, size, perm, file_like_t()); // <-- KABOOM!
               created     = true;
               completed   = true;
            }
            catch(interprocess_exception &ex){
               if(ex.get_error_code() != already_exists_error){
                  throw;
               }
               else{
                  try{
                     DeviceAbstraction tmp(open_only, id, read_write);
                     dev.swap(tmp);
                     created     = false;
                     completed   = true;
                  }
                  catch(interprocess_exception &e){
                     if(e.get_error_code() != not_found_error){
                        throw;
                     }
                  }
                  catch(...){
                     throw;
                  }
               }
            }
            catch(...){
               throw;
            }
            thread_yield();
         }
      }
Was it helpful?

Solution

I believe I've had some of the same issues you are having. Take a look at the function "shared_memory_object::priv_open_or_create" in "\boost\interprocess\shared_memory_object.hpp". At the top of that function is another function call "create_tmp_and_clean_old_and_get_filename" that starts a function chain that winds up deleting the shared memory file. I wound up moving that function call lower in the priv_open_or_create function around where the case statements start. I believe I'm using boost 1.48. Here's the final version of that function that I modified:

inline bool shared_memory_object::priv_open_or_create
   (ipcdetail::create_enum_t type, const char *filename, mode_t mode, const permissions &perm)
{
   m_filename = filename;
   std::string shmfile;
   std::string root_tmp_name;

   //Set accesses
   if (mode != read_write && mode != read_only){
      error_info err = other_error;
      throw interprocess_exception(err);
   }

   switch(type){
      case ipcdetail::DoOpen:
            ipcdetail::get_tmp_base_dir(root_tmp_name);
            shmfile = root_tmp_name;
            shmfile += "/";
            shmfile += filename;
            m_handle = ipcdetail::open_existing_file(shmfile.c_str(), mode, true);
      break;
      case ipcdetail::DoCreate:
            ipcdetail::create_tmp_and_clean_old_and_get_filename(filename, shmfile);
          m_handle = ipcdetail::create_new_file(shmfile.c_str(), mode, perm, true);
      break;
      case ipcdetail::DoOpenOrCreate:
         ipcdetail::create_tmp_and_clean_old_and_get_filename(filename, shmfile);
          m_handle = ipcdetail::create_or_open_file(shmfile.c_str(), mode, perm, true);
      break;
      default:
         {
            error_info err = other_error;
            throw interprocess_exception(err);
         }
   }

   //Check for error
   if(m_handle == ipcdetail::invalid_file()){
      error_info err = system_error_code();
      this->priv_close();
      throw interprocess_exception(err);
   }

   m_mode = mode;
   return true;
}

BTW, if anyone knows the official channels I can go through to try and get this verified and added to boost please let me know as I hate modifying stuff like this without knowing its full effect.

Hope this helps!

OTHER TIPS

Boost is full of both amazing and scary things.

A simple workaround on Windows, could be to switch to managed_windows_shared_memory instead of managed_shared_memory, you can solve a variety of nasty crash/hang problems, and one sort of crash/hang problem appears to be caused, in turn by the differences between Windows file system behaviour and unix file system behaviour, and in particular, it seems that with boost and managed_shared_memory on Windows, it is possible to run afoul of Windows file system locking limitations. I am informed that an effort to deal with this has been completed in BOost 1.53, but I am using Boost 1.53 and I still have this problem.

With regular managed_shared_memory on Windows, you get persistence beyond the life of any of the client or server applications. This might be desirable in some people's cases thus the workaround is not a real fix for those people.

However, in my case, I didn't really need that anyways, although I had thought it would be handy, it turns out to be more pain than it's worth, at least with the current Boost implementation on Windows.

I would like to also point out that deletion of the shared memory file appears to be the root cause of the race condition that is causing the problem experienced in the question above. Proper synchronization around creation and checking, and deletion of the file appears to be essential to a real world implementation of the system, and in particular, it appears to be a devastating problem, if you have your master (server) delete the shared memory file while some clients are still using it. A reboot appears necessary to clear the resulting lock+NTFS-filesystem mess.

If I find a real solution I'll post it, but the above is more information than I could find anywhere else. Be wary of managed_shared_memory and consider using managed_windows_shared_memory and forget about trying to make the "persistent shared memory" idea work. Rather, use non-persistent windows-only managed_windows_shared_memory.

Solving this, while keeping the managed_shared_memory class in my application probably means wrapping all access to the managed_shared_memory object in yet another level of interprocess synchronization primitives, or even with a raw Win32 API mutex. Boost could do something equivalent, but probably would introduce yet more accidental complexity.

(Aside: Am I the only one here who thinks that Template-All-the-things has been carried too far in general use, and especially in Boost, these days?)

Update 2: I have found an alternative way of freezing up managed_shared_memory and this in turn freezes up any app you use it from. I did not expect it to be so easy to create deadlocks with Boost but it is pretty easy to do. The mutex code inside the implementation will freeze forever waiting for a mutex that some other user of the managed shared memory has gone away without releasing. This endless sleep waiting for a mutex that is never going to be released, is another deep design flaw in this boost interprocess implementation that so far, I have counted several serious design flaws in, at least on windows. Maybe it works beautifully on Linux.

The code that exhibits this is the find() method, called like this:

   boost::interprocess::managed_shared_memory * segment;
   std::pair<MyType*, std::size_t> f = segment->find<MyType>(name);

Here is the stack trace for a mutex deadlock (aka endless wait, frozen task):

Only solution when you get here, is to delete the shared memory area, after stopping or killing all hung processes that are waiting for this mutex.

>   myapp.exe!boost::interprocess::winapi::sched_yield()  Line 998  C++
    myapp.exe!boost::interprocess::ipcdetail::thread_yield()  Line 60 + 0xe bytes   C++
    myapp.exe!boost::interprocess::ipcdetail::spin_mutex::lock()  Line 71   C++
    myapp.exe!boost::interprocess::ipcdetail::spin_recursive_mutex::lock()  Line 91 C++
    myapp.exe!boost::interprocess::interprocess_recursive_mutex::lock()  Line 161   C++
    myapp.exe!boost::interprocess::scoped_lock<boost::interprocess::interprocess_recursive_mutex>::lock()  Line 280 C++
    myapp.exe!boost::interprocess::segment_manager<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index>::priv_get_lock(bool use_lock=true)  Line 1340   C++
    myapp.exe!boost::interprocess::segment_manager<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index>::priv_generic_find<char>(const char * name=0x00394290, boost::interprocess::iset_index<boost::interprocess::ipcdetail::index_config<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0> > > & index={...}, boost::interprocess::ipcdetail::in_place_interface & table={...}, unsigned int & length=1343657312, boost::interprocess::ipcdetail::bool_<1> is_intrusive={...}, bool use_lock=true)  Line 854 + 0x11 bytes  C++
    myapp.exe!boost::interprocess::segment_manager<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index>::priv_find_impl<boost::container::map<AreaKeyType,DATA_AREA_DESC,std::less<AreaKeyType>,boost::interprocess::allocator<std::pair<AreaKeyType const ,DATA_AREA_DESC>,boost::interprocess::segment_manager<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index> > > >(const char * name=0x00394290, bool lock=true)  Line 728 + 0x25 bytes    C++
    myapp.exe!boost::interprocess::segment_manager<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index>::find<boost::container::map<AreaKeyType,DATA_AREA_DESC,std::less<AreaKeyType>,boost::interprocess::allocator<std::pair<AreaKeyType const ,DATA_AREA_DESC>,boost::interprocess::segment_manager<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index> > > >(const char * name=0x00394290)  Line 423 + 0x1e bytes  C++
    myapp.exe!boost::interprocess::ipcdetail::basic_managed_memory_impl<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index,8>::find<boost::container::map<AreaKeyType,DATA_AREA_DESC,std::less<AreaKeyType>,boost::interprocess::allocator<std::pair<AreaKeyType const ,DATA_AREA_DESC>,boost::interprocess::segment_manager<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index> > > >(boost::interprocess::ipcdetail::char_ptr_holder<char> name={...})  Line 346 + 0x23 bytes   C++
    myapp.exe!boost::interprocess::basic_managed_shared_memory<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index>::find<boost::container::map<AreaKeyType,DATA_AREA_DESC,std::less<AreaKeyType>,boost::interprocess::allocator<std::pair<AreaKeyType const ,DATA_AREA_DESC>,boost::interprocess::segment_manager<char,boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family,boost::interprocess::offset_ptr<void,int,unsigned int,0>,0>,boost::interprocess::iset_index> > > >(boost::interprocess::ipcdetail::char_ptr_holder<char> name={...})  Line 208 + 0x10 bytes  C++
    myapp.exe!CCommonMemory::AllocateOrFindAreaMap(const char * name=0x00394290)  Line 128  C++
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top