Question

I am working on a concurrent data structure that uses a number of small sized temporary objects. A lot of these objects are the same size. So to reduce the strain on the memory allocator I have been using a thread local map to store the objects as <size, list> tuples.

When a thread needs an object, it checks the map for a suitable object before going to the allocator. This works well and has improved performance significantly, however it is prone to the issue where overtime one thread loses its entire pool to other threads, forcing it to allocate new objects. If the application runs for a long period of time, I find a few threads having memory pools way to large.

To solve this I want to add a shared memory pool between the thread local pools and the allocator. Since the number of structs and the size of the structs are constant at compile time, I figure that there should be someway using macros to map each size to an array position. Allowing for a much lighter memory management.

Here is my current solution

#define RC_OBJECT_COUNT 0
#define ADD_RC_OBJECT(object) \
    #ifndef RC_MAP_``sizeof(object)''
        #define RC_OBJECT_TEMP RC_OBJECT_COUNT \
        #undefine RC_OBJECT_COUNT \
        #define RC_OBJECT_COUNT RC_OBJECT_TEMP+1 \
        #define RC_MAP_``sizeof(object)'' RC_OBJECT_TEMP
    #endif

Is there a way to echo the the result of the call to sizeof(object) into the defined variable name? Preferably without a separate configuration script.

Was it helpful?

Solution

I've written code that works kind of like what you are discussing, but I didn't use preprocessor macros at all the way you are trying to do.

My code has a single "object manager", with a small API. Any part of my program that wants to get objects calls an API function, the "register function", that says "I want to request objects with the following characteristics". The register function returns a handle. Then, there is a function GetObject() that takes the handle as an argument, and returns a pointer to an object. When the code is done with the object there is a function ReleaseObject() that takes a pointer to the object.

For each different object there is a linked list that I call the "ready list". The code always inserts and removes from the head of the list (since one uninitialized object is as good as another; they are all the same size). My code is single-threaded so I don't have any locking issues, but for multithreaded I would need to put a lock on each ready list. (But it's very fast to insert or remove at the head of a linked list so no thread would need the lock very long.)

For my purposes, different parts of my program could share objects, so I had a reference count on each object. ReleaseObject() would decrement the reference count, and when it went to zero, would put the object on the appropriate ready list.

The handle returned by GetObject() is really just a pointer to the linked list structure.

In my code, if there is a call to GetObject() and the ready list is empty, then malloc() is called automatically and a new object created and returned. The register function takes a pointer to a function to call to create an object with malloc(), a pointer to a function to free an object with free(), and a pointer to a "sanity check" function (since I like my programs to check themselves at runtime with calls to assert()), and any arguments like size of object.

If multiple parts of my program register that they want the same kind of object, the object manager notices this and just returns a handle to the ready list that was already set up by the first call to register. Now they are sharing objects in a single ready list.

This may sound complicated but it didn't take me long to build and it works very well. If there is no object on the ready list, the object manager knows to just call the function pointer stored in the ready list struct, to get a new object, and then it returns that.

The most common bug I have found in my programs: failure to call ReleaseObject() when done with the object. Then the program has a memory leak and calls malloc() a lot, and on an embedded platform runs out of memory and dies. Usually it's very easy to notice this, and add in an appropriate call to ReleaseObject().

EDIT: (In response to a question in the comments) The object manager keeps an array of different object management struct instances. Each struct stores three function pointers: pointer to a "new" function, a pointer to a "delete" function, a pointer to a "sanity check" function; a handful of values that are passed to the "new" function when it is called (for example, size of a desired buffer); and the head of the linked list of objects. When code calls the "register" function, the object manager checks to see if any spot in this array has identical values from "register" (the 3 function pointers and the handful of values). If the identical values are found, the object manager returns a pointer to that object manager struct instance. If the identical values are not found, the object manager copies those values into the next available struct in the array and returns a pointer to that.

This means that my "register" function is O(N) in the number of different kinds of objects being managed, but for my app there are only about 4 different kinds of objects so I never tried to optimize this. My "get" function is O(1), as it has a pointer right to the correct object manager struct, and removing from the head of a linked list is a constant-time operation.

The array of object manager structs is allocated by malloc() and if additional object types are registered the code can call realloc() to grow the memory.

In my application, I haven't had the need for an "unregister" operation, but if there were one, it would involve freeing all the objects on that ready list, and marking that spot in the object manager array as unused.

My app is an audio processing engine, and it never wants to call malloc() while processing audio because malloc() might decide to re-organize the free blocks list or something and take a while, and the audio playback might glitch. The engine has an "init" phase before starting playback where the code calls the "register" function and all the audio buffers get allocated; then at runtime the buffers just fly off and onto the ready lists. It really works quite well, and I have run it on low-power DSP chips with no problems.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top