Question

I have a Visual Studio 2008 C++ application where I'm using a custom allocator for standard containers such that their memory comes from a Memory Mapped File rather than the heap. This allocator is used for 4 different use cases:

  1. 104-byte fixed size structure std::vector< SomeType, MyAllocator< SomeType > > foo;
  2. 200-byte fixed size structure
  3. 304-byte fixed size structure
  4. n-byte strings std::basic_string< char, std::char_traits< char >, MyAllocator< char > > strn;

I need to be able to allocate roughly 32MB total for each of these.

The allocator tracks memory usage using a std::map of pointers to allocation size. typedef std::map< void*, size_t > SuperBlock; Each SuperBlock represents 4MB of memory.

There is a std::vector< SuperBlock > of these in case one SuperBlock isn't enough space.

The algorithm used for the allocator goes like this:

  1. For each SuperBlock: Is there space at the end of the SuperBlock? put the allocation there. (fast)
  2. If not, search within each SuperBlock for an empty space of sufficient size and put the allocation there. (slow)
  3. Still nothing? allocate another SuperBlock and put the allocation at the start of the new SuperBlock.

Unfortunately, step 2 can become VERY slow after a while. As copies of objects are made and temporary variables destroyed I get a lot of fragmentation. This causes a lot of deep searching within the memory structure. Fragmentation is in issue as I have a limited amount of memory to work with (see note below)

Can anybody suggest improvements to this algorithm that would speed up the process? Do I need two separate algorithms (1 for the fixed-size allocations and one for the string allocator)?

Note: For those that need a reason: I'm using this algorithm in Windows Mobile where there's a 32MB process slot limit to the Heap. So, the usual std::allocator won't cut it. I need to put the allocations in the 1GB Large Memory Area to have enough space and that's what this does.

Was it helpful?

Solution

For the fixed sized objects, you can create a fixed sized allocator. Basically you allocate blocks, partition into subblocks of the appropriate size and create a linked list with the result. Allocating from such a block is O(1) if there is memory available (just remove the first element from the list and return a pointer to it) as is deallocation (add the block to the free list). During allocation, if the list is empty, grab a new superblock, partition and add all blocks into the list.

For the variable sized list, you can simplify it to the fixed size block by allocating only blocks of known sizes: 32 bytes, 64 bytes, 128 bytes, 512 bytes. You will have to analyze the memory usage to come up with the different buckets so that you don't waste too much memory. For large objects, you can go back to a dynamic size allocation pattern, that will be slow, but hopefully the amount of large objects is limited.

OTHER TIPS

Can you have a separate memory allocation pool for each different fixed-size type you are allocating? That way there won't be any fragmentation, because the allocated objects will always align on n-byte boundaries. That doesn't help for the variable-length strings, of course.

There's an example of small-object allocation in Alexandrescu's Modern C++ design that illustrates this principle and may give you some ideas.

Building on Tim's answer, I would personally use something akin to BiBOP.

The basic idea is simple: use fixed-size pools.

There are some refinements to that.

First, the size of the pools is generally fixed. It depends on your allocation routine, typically if you know the OS you're working on map at least 4KB at once when you use malloc, then you use that value. For a memory mapped file, you might be able to increase this.

The advantage of fixed-size pools is that it nicely fights off fragmentation. All pages being of the same size, you can recycle an empty 256-bytes page into a 128-bytes page easily.

There is still some fragmentation for large objects, which are typically allocated outside of this system. But it's low, especially if you fit large objects into a multiple of the page size, this way the memory will be easy to recycle.

Second, how to handle the pools ? Using linked-lists.

The pages are typically untyped (by themselves), so you have a free-list of pages in which to prepare new pages and put "recycled" pages.

For each size category you then have a list of "occupied" pages, in which memory has been allocated. For each page you keep:

  • the allocation size (for this page)
  • the number of allocated objects (to check for emptiness)
  • a pointer to the first free cell
  • a pointer to the previous and next pages (might point to the "head" of the list)

Each free-cell is itself a pointer (or index, depending on the size you have) to the next free-cell.

The list of "occupied" pages of a given size is simply managed:

  • on deletion: if you empty the page, then remove it from the list and push it into the recycled pages, otherwise, update the free-cell list of this page (note: finding the beginning of the current page is usually a simple modulo operation on the address)
  • on insertion: search starting from head, as soon as you find a non-full page, move it in front of the list (if it's not already) and insert your item

This scheme is really performant memory-wise, with only a single page reserved for indexing.

For multi-threaded / multi-processes applications, you'll need to add synchronization (a mutex per page typically), in case you could get inspiration from Google's tcmalloc (try and find another page instead of blocking, use a thread-local cache to remember which page you last used).

Having said that, have you tried Boost.Interprocess ? It provides allocators.

For the fixed sizes you can easily use a small memory allocator type of allocator where you allocate a large block that's split into fixed-size chunks. You then create a vector of pointers to available chunks and pop/push as you allocate/free. This is very fast.

For variable length items, it's harder: You either have to deal with searching for available contiguous space or use some other approach. You might consider maintaining another map of all the free nodes ordered by block size, so you can lower_bound the map and if the next available node is say only 5% too big return it instead of trying to find usable available space of the exact size.

My inclination for variable-sized items would be to, if practical, avoid holding direct pointers to data and instead keep handles. Each handle would be a the index of a superblock, and an index to an item within the superblock. Each superblock would have an item-list allocated top-down and items allocated bottom-up. Each item's allocation would be preceded by its length, and by the index of the item it represents; use one bit of the index to indicate whether an item is 'pinned'.

If an item fits after the last allocated item, simply allocate it. If it would hit a pinned item, move the next-allocation mark past the pinned item, find the next higher pinned item, and try the allocation again. If the item would collide with the item-list but there's enough free space somewhere, compactify the block's contents (if one or more items are pinned, it may be better to use another superblock if one is available). Depending upon usage patterns, it may be desirable to start by only compactifying the stuff that was added since the last collection; if that doesn't provide enough space, then compactify everything.

Of course, if only have a few discrete sizes of items, you can use simple fixed-sized-chunk allocators.

I agree with Tim - use memory pools to avoid fragmentation.

However you may be able to avoid some of the churn by storing pointers rather than objects in your vectors, perhaps ptr_vector?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top