Question

I want to use a vector with the custom allocator below, in which construct() and destroy() have an empty body:

struct MyAllocator : public std::allocator<char> {
    typedef allocator<char> Alloc;
    //void destroy(Alloc::pointer p) {} // pre-c+11
    //void construct(Alloc::pointer p, Alloc::const_reference val) {} // pre-c++11
    template< class U > void destroy(U* p) {}
    template< class U, class... Args > void construct(U* p, Args&&... args) {}
    template<typename U> struct rebind {typedef MyAllocator other;};
};

Now for the reasons I have specified in another question, the vector has to be resized several times in a loop. To simplify my tests on performance, I made a very simple loop like the following:

std::vector<char, MyAllocator> v;
v.reserve(1000000); // or more. Make sure there is always enough allocated memory
while (true) {
   v.resize(1000000);
   // sleep for 10 ms
   v.clear(); // or v.resize(0);
};

I noticed that changing the size that way the CPU consumption increases from 30% to 80%, despite the allocator has empty construct() and destroy() member functions. I would have expected a very minimal impact or no impact at all (with optimization enabled) on performance because of that. How is that consumption increment possible? A second question is: why when reading the memory after any resize, I see that the value of each char in the resized memory is 0 (I would expect some non-zero values, since constuct() does nothing) ?

My environment is g++4.7.0 , -O3 level optimization enabled. PC Intel dual core, 4GB of free memory. Apparently calls to construct could not be optimized out at all?

Was it helpful?

Solution

Updated

This is a complete rewrite. There was an error in the original post/my answer which made me benchmark the same allocator twice. Oops.

Well, I can see huge differences in performance. I have made the following test bed, which takes several precautions to ensure crucial stuff isn't completely optimized out. I then verified (with -O0 -fno-inline) that the allocator's construct and destruct calls are getting called the expected number of times (yes):

#include <vector>
#include <cstdlib>

template<typename T>
struct MyAllocator : public std::allocator<T> {
    typedef std::allocator<T> Alloc;
    //void destroy(Alloc::pointer p) {} // pre-c+11
    //void construct(Alloc::pointer p, Alloc::const_reference val) {} // pre-c++11
    template< class U > void destroy(U* p) {}
    template< class U, class... Args > void construct(U* p, Args&&... args) {}
    template<typename U> struct rebind {typedef MyAllocator other;};
};

int main()
{
    typedef char T;
#ifdef OWN_ALLOCATOR
    std::vector<T, MyAllocator<T> > v;
#else
    std::vector<T> v;
#endif
    volatile unsigned long long x = 0;
    v.reserve(1000000); // or more. Make sure there is always enough allocated memory
    for(auto i=0ul; i< 1<<18; i++) {
        v.resize(1000000);
        x += v[rand()%v.size()];//._x;
        v.clear(); // or v.resize(0);
    };
}

The timing difference is marked:

g++ -g -O3 -std=c++0x -I ~/custom/boost/ test.cpp -o test 

real    0m9.300s
user    0m9.289s
sys 0m0.000s

g++ -g -O3 -std=c++0x -DOWN_ALLOCATOR -I ~/custom/boost/ test.cpp -o test 

real    0m0.004s
user    0m0.000s
sys 0m0.000s

I can only assume that what you are seeing is related to the standard library optimizing allocator operations for char (it being a POD type).

The timings get even farther apart when you use

struct NonTrivial
{
    NonTrivial() { _x = 42; }
    virtual ~NonTrivial() {}
    char _x;
};

typedef NonTrivial T;

In this case, the default allocator takes in excess of 2 minutes (still running). whereas the 'dummy' MyAllocator spends ~0.006s. (Note that this invokes undefined behaviour referencing elements that haven't been properly initialized.)

OTHER TIPS

(With corrections thanks to GManNickG and Jonathan Wakely below)

In C++11, with the post-Standard correction proposed at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3346.pdf, resize() will construct added elements using the custom allocator.

In earlier versions, resize() value initialises the elements added, which takes time.

These initialisation steps are nothing to do with memory allocation, it's what's done to the memory after it's allocated. Value initialisation is an unavoidable expense.

Given the state of C++11 Standards compliance in current compilers, it would be worth looking at your headers to see which approach is in use.

The value initialisation was sometimes unnecessary and inconvenient, but also protected a lot of programs from unintended mistakes. For example, someone might think they can resize a std::vector<std::string> to have 100 "uninitialised" strings, then start assigning into them before reading from them, but a prerequisite for the assignment operator is that the object being changed has been properly constructed... otherwise it'll likely find a garbage pointer and try to delete[] it. Only careful placement new-ing of each element can safely construct them. The API design errs on the side of robustness.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top