Question

Valgrind is reporting leaked blocks, apparently one per thread, in the following code:

#include <iostream>
#include <thread>
#include <mutex>
#include <list>
#include <chrono>

std::mutex cout_mutex;

struct Foo
{
    Foo() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }

    ~Foo() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }

    void 
    hello_world() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }
};

void
hello_world_thread()
{
    thread_local Foo foo;

    // must access, or the thread local variable may not be instantiated
    foo.hello_world();

    // keep the thread around momentarily
    std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
}

int main()
{
    for ( int i = 0; i < 100; ++i )
    {
        std::list<std::thread> threads;

        for ( int j = 0; j < 10; ++j )
        {
            std::thread thread( hello_world_thread );
            threads.push_back( std::move( thread ) );
        }

        while ( ! threads.empty() )
        {
            threads.front().join();
            threads.pop_front();
        }
    }
}

Compiler version:

$ g++ --version
g++ (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

GCC build options:

--enable-shared
--enable-threads=posix
--enable-__cxa_atexit
--enable-clocale=gnu
--enable-cxx-flags='-fno-omit-frame-pointer -g3'
--enable-languages=c,c++
--enable-libstdcxx-time=rt
--enable-checking=release
--enable-build-with-cxx
--disable-werror
--disable-multilib
--disable-bootstrap
--with-system-zlib

Program compilation options:

g++ -std=gnu++11 -Og -g3 -Wall -Wextra -fno-omit-frame-pointer thread_local.cc

valgrind version:

$ valgrind --version
valgrind-3.8.1

Valgrind options:

valgrind --leak-check=full --verbose ./a.out > /dev/null

Tail-end of valgrind output:

==1786== HEAP SUMMARY:
==1786==     in use at exit: 24,000 bytes in 1,000 blocks
==1786==   total heap usage: 3,604 allocs, 2,604 frees, 287,616 bytes allocated
==1786== 
==1786== Searching for pointers to 1,000 not-freed blocks
==1786== Checked 215,720 bytes
==1786== 
==1786== 24,000 bytes in 1,000 blocks are definitely lost in loss record 1 of 1
==1786==    at 0x4C29969: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:329)
==1786==    by 0x4E8E53E: __cxa_thread_atexit (atexit_thread.cc:119)
==1786==    by 0x401036: hello_world_thread() (thread_local.cc:34)
==1786==    by 0x401416: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (functional:1732)
==1786==    by 0x4EE4830: execute_native_thread_routine (thread.cc:84)
==1786==    by 0x5A10E99: start_thread (pthread_create.c:308)
==1786==    by 0x573DCCC: clone (clone.S:112)
==1786== 
==1786== LEAK SUMMARY:
==1786==    definitely lost: 24,000 bytes in 1,000 blocks
==1786==    indirectly lost: 0 bytes in 0 blocks
==1786==      possibly lost: 0 bytes in 0 blocks
==1786==    still reachable: 0 bytes in 0 blocks
==1786==         suppressed: 0 bytes in 0 blocks
==1786== 
==1786== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
--1786-- 
--1786-- used_suppression:      2 dl-hack3-cond-1
==1786== 
==1786== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Constructors and destructors were run once for each thread:

$ ./a.out | grep 'Foo::Foo' | wc -l
1000

$ ./a.out | grep hello_world | wc -l
1000

$ ./a.out | grep 'Foo::~Foo' | wc -l
1000

Notes:

  • If you change the number of threads created, the number of leaked blocks matches the number of threads.
  • The code is structured in such a way that might permit resource reuse (i.e. the leaked block) if GCC were so implemented.
  • From the valgrind stacktrace, thread_local.cc:34 is the line: thread_local Foo foo;
  • Due to the sleep_for() call, a program run takes about 10 seconds or so.

Any idea if this memory leak is in GCC, a result of my config options, or is some bug in my program?

Was it helpful?

Solution

It seems that the leak comes from the dynamic initialization.

Here is an example with an int :

thread_local int num=4; //static initialization

The last example does not leak. I tried it with 2 threads and no leak at all.

But now :

int func()
{
    return 4;
}
thread_local int num2=func(); //dynamic initialization

This one leak ! With 2 threads it gives total heap uage: 8 allocs, 6 frees, 428 bytes allocated...

I would suggest you to use a workaround like :

thread_local Foo *foo = new Foo; //dynamic initialization

No forget at the end of the thread execution to do :

delete foo;

But the last example as one problem : What if the thread exit with error before your delete ? Leak again...

It seems that there is no great solution. Maybe we should report that to the g++ developers about that ?

OTHER TIPS

try removing thread_local and using the following code

    void
    hello_world_thread()
    {
        Foo foo;

        // must access, or the thread local variable may not be instantiated
        foo.hello_world();

        // keep the thread around momentarily
        std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
    }

foo within hello_world_thread should be in the local stack for every thread. so every thread will maintain its own copy of foo. no need to explicitly marking it as thread_local. A thread_local should be used in a context when you have something like static or namespace level variable but you want each variable to maintain its own copy for every thread.

Regards Kajal

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top