Surprisingly inefficent custom allocator for vector<char>

Question 1

Updated

^{This is a complete rewrite. There was an error in the original post/my answer which made me benchmark the same allocator twice. Oops.}

Well, I can see huge differences in performance. I have made the following test bed, which takes several precautions to ensure crucial stuff isn't completely optimized out. I then verified (with -O0 -fno-inline) that the allocator's construct and destruct calls are getting called the expected number of times (yes):

#include <vector>
#include <cstdlib>

template<typename T>
struct MyAllocator : public std::allocator<T> {
    typedef std::allocator<T> Alloc;
    //void destroy(Alloc::pointer p) {} // pre-c+11
    //void construct(Alloc::pointer p, Alloc::const_reference val) {} // pre-c++11
    template< class U > void destroy(U* p) {}
    template< class U, class... Args > void construct(U* p, Args&&... args) {}
    template<typename U> struct rebind {typedef MyAllocator other;};
};

int main()
{
    typedef char T;
#ifdef OWN_ALLOCATOR
    std::vector<T, MyAllocator<T> > v;
#else
    std::vector<T> v;
#endif
    volatile unsigned long long x = 0;
    v.reserve(1000000); // or more. Make sure there is always enough allocated memory
    for(auto i=0ul; i< 1<<18; i++) {
        v.resize(1000000);
        x += v[rand()%v.size()];//._x;
        v.clear(); // or v.resize(0);
    };
}

The timing difference is marked:

g++ -g -O3 -std=c++0x -I ~/custom/boost/ test.cpp -o test 

real    0m9.300s
user    0m9.289s
sys 0m0.000s

g++ -g -O3 -std=c++0x -DOWN_ALLOCATOR -I ~/custom/boost/ test.cpp -o test 

real    0m0.004s
user    0m0.000s
sys 0m0.000s

I can only assume that what you are seeing is related to the standard library optimizing allocator operations for char (it being a POD type).

The timings get even farther apart when you use

struct NonTrivial
{
    NonTrivial() { _x = 42; }
    virtual ~NonTrivial() {}
    char _x;
};

typedef NonTrivial T;

In this case, the default allocator takes in excess of 2 minutes (still running). whereas the 'dummy' MyAllocator spends ~0.006s. (Note that this invokes undefined behaviour referencing elements that haven't been properly initialized.)

Question 2

(With corrections thanks to GManNickG and Jonathan Wakely below)

In C++11, with the post-Standard correction proposed at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3346.pdf, resize() will construct added elements using the custom allocator.

In earlier versions, resize() value initialises the elements added, which takes time.

These initialisation steps are nothing to do with memory allocation, it's what's done to the memory after it's allocated. Value initialisation is an unavoidable expense.

Given the state of C++11 Standards compliance in current compilers, it would be worth looking at your headers to see which approach is in use.

The value initialisation was sometimes unnecessary and inconvenient, but also protected a lot of programs from unintended mistakes. For example, someone might think they can resize a std::vector<std::string> to have 100 "uninitialised" strings, then start assigning into them before reading from them, but a prerequisite for the assignment operator is that the object being changed has been properly constructed... otherwise it'll likely find a garbage pointer and try to delete[] it. Only careful placement new-ing of each element can safely construct them. The API design errs on the side of robustness.