How would one implement a variable sized cache object to reduce memory allocations in C++?

https://stackoverflow.com/questions/4730312

12-10-2019
|

Question

Before the performance people tear my head off: yes, I have done profiling before asking this :)

I'm once again looking at my one of a type container, and while I have a solution that works, the performance is poor because each type of item that's cached results in a separate allocation on the heap (which is of course expensive).

Based on static analysis of my program's input, I have figured out a way to know the total size required by all of the objects that might be put in my cache object that's getting passed around. Basically, I have a list of objects that may be constructed in a given cache object, so I know what the size of what I might have to cache is in advance, but not at compile time -- runtime only.

Basically, what I'd like to do is what boost::make_shared does -- gets a single memory block, and constructs the shared_ptr bits as well as the controlled object in the same memory block.

I don't have to worry about preserving copying behavior as the cache object is noncopyable and passed around by pointer by clients (it's usually stored in something like a ptr_vector or a std::auto_ptr).

I'm not familiar however with how exactly one would implement such a container however, namely how one follows alignment restrictions and such.

In pseudocode, what I'd like to do:

//I know a lot of what's in here is not portable -- I need to run only on x86
//and x64 machines. Yes, this couple of classes looks hacky, but I'd rather
//have one hacky class than a whole programfull :)

class CacheRegistrar
{
    //Blah blah
public:
    //Figures out what objects will be in the cache, etc
    const std::vector<std::size_t>& GetRequiredObjectSizes() const;
    //Other stuff...
    template <typename T>
    void RegisterCacheObject();
    template <typename T>
    std::size_t GetObjectIndex() const;
    // etc.
};

class CacheObject;

std::auto_ptr<CacheObject> CacheObjectFactory(const CacheRegistrar& registrar)
{
    //Pretend this is in a CPP file and therefore CacheObject is defined...
    const std::vector<size_t>& sizes(registrar.GetRequiredObjectSizes());
    std::size_t sumOfCache = std::accumulate(sizes.begin(), sizes.end());
    sumOfCache += sizeof(CacheObject);
    boost::scoped_array<char> buffer(new char[] sumOfCache);
    CacheObject *obj = new (reinterpret_cast<void *>(buffer.get())) CacheObject;
    buffer.release(); //PSEUDOCODE (boost::scoped_array has no release member);
    return std::auto_ptr<CacheObject>(obj); //Nothrow
}

class CacheObject
{
    CacheRegistrar *registrar; //Set by my constructor
public:
    template<typename T>
    T& Get()
    {
        char * startOfCache = reinterpret_cast<char *>(this) + 
            sizeof(CacheObject);
        char * cacheItem = startOfCache + registrar->GetObjectIndex<T>();
        return *reinterpret_cast<T*>(cacheItem);
    }
};

Is my general concept sound here? Is there are better way of accomplishing this?

Solution

But first, read this article by Andrei Alexandrescu on what he thinks he should have written in that chapter -- a way to build heaps using Heap Layers (by yours truly). I used Heap Layers to build Hoard, DieHard, and DieHarder, as well as the custom allocators used in our OOPLSA 2002 paper, Reconsidering Custom Memory Allocation, which you should also read before embarking on creating a custom allocator.

OTHER TIPS

Check out the Loki small objects allocator.

Quick-googling that didn’t yield any direct human-oriented docs. There is DOxygen-generated documentation but not especially grokkable. However, the design and implementation is documented in Andrei Alexandrescu’s "Modern C++ Design".

If you just want efficient recycling for objects of a given class, then consider a simple free-list – possibly a free-list of raw storage chunks.

Cheers & hth.,

The key issue I see is returning an

auto_ptr

for memory allocated in a non-default fashion. You could solve this by defining a suitable overloaded delete, but its better to define your own destroy function as part of the factory. If you do this, you also localise the memory management in the Cache class, allowing you more freedom to improve performance local to that class. If course, using a smart pointer to control memory management is a good idea; what you would need to do is define your own allocator and define a smart_ptr to use it.

For reference, another approach to managing custom allocation is to define a custom new operator. I.e. This sort of thing:

struct Cache
{
    void* allocate(size_t size)
    {
        size_t blockSize = sizeof(size_t) + size;
        // Placeholder: do what ever appropriate to blocks of size 'blockSize'
        return malloc(blockSize);
    }
    void destroy(void* p)
    {
        size_t* block = reinterpret_cast<size_t*>(p);
        size_t blockSize = *block;
        // Placeholder: do what ever appropriate to blocks of size 'blockSize'
        free(p);
    }

};
Cache cache;


void* operator new (size_t size, Cache& cache )
{
    return cache.allocate(size);
}

struct CacheObject 
{
    void operator delete(void* p)
    {
        cache.destroy(p);
    }
};


CacheObject* co = new (cache) CacheObject;

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow