Question

This is not strictly related to C++ but its type system serves to illustrate the problem well.

Assume:

  • We have a generic template Container<T> (such as a std::vector<T>) which stores an unspecified number of elements of the same type T on the heap. Each element is allocated and deallocated via the constructor and destructor of T.

  • A type M that stores m elements on the heap (so M itself is a kind of container). However, the value of m is not known at compile time, so it cannot be used as a template parameter to M.

Problem:

  • If we try to use a type Container<M>, accessing the elements of M inside this container would require two pointer dereferences, which in general has worse performance than if all elements of M were laid out contiguously.

Non-ideal solution:

  • Create a separate class ContainerFixed<T> that manages the memory of T directly, if T is assumed to have the same size throughout the container. T would also have to exchange information about its storage with the containing class. However, if we implement another kind of container, then we have to repeat this process again.

The problem arises because the container class is oblivious of the storage requirements of T so it defers the allocation to T itself; if the container had known that T only requires a fixed (though not known at compile time) number of elements then this issue could have been avoided.

Now I feel as though that if there was some way to communicate to the container that M has a definite size m (as in M<m>, which is not legal in C++), then from the type signature of Container<M<m> > one can potentially deduce that a faster allocation scheme is possible. (It's a bit like dependent types, in a way.) That is, any Container1<Container2<m> > can be allocated more efficiently as a "flattened" container.

What would be a more general solution to this problem? I don't think this is something that can be easily solved in C++ (feel free to prove me wrong), so I would certainly like to hear if this is something that other languages can solve in a much simpler way (perhaps with more sophisticated type systems).

Was it helpful?

Solution

I think you could more or less solve this by having an API similar to allocators, whereby a container could specify how to allocate its metadata and elements contiguously. That is, a container would provide an “unboxed mode” in which it would presumably be immutable, or at least of fixed size. You couldn’t allocate unboxed containers on the stack because their sizeof is not a compile-time constant, but you could use this API when constructing a nested container.

This would seem to work naturally for std::vector and std::array, but for std::list and std::deque (which is typically implemented as a linked list of fixed-size chunks) or tree structures such as std::map, it’s not necessarily clear what such “unboxing” would actually mean. Do you stand to gain from allocating all the nodes in the tree contiguously, but still traversing them with lots of pointer indirections? Without benchmarking it’s hard to say.

OTHER TIPS

Now I feel as though that if there was some way to communicate to the container that M has a definite size m (as in M, which is not legal in C++), then from the type signature of Container > one can potentially deduce that a faster allocation scheme is possible. (It's a bit like dependent types, in a way.) That is, any Container1 > can be allocated more efficiently as a "flattened" container.

This is not a conceptually simple problem at all, because you have two competing aggregates here (the class M and the vector) which want to be independently-sized dynamically.

From a compiler and language design standpoint with a focus on standard approaches to type systems, this is an incredibly difficult problem and would require some fusion/coalescing of the two aggregate concepts.

In C++, if you want the most efficient solution here, you typically have to implement this fused aggregate (the container and M as a single class fusing the two ideas together). It's a real hassle, but if you really need the extra performance gain, that's typically the way to go. The alternate route is to reach for a custom allocator under the hood (but that's almost like implementing a separate data structure still, and still exposes the overhead of pointers in the vector which degrades locality of reference, though not as much as lacking the custom allocator).

A language that can tackle this might be ideal for today's day and age of predominant memory optimization in performance-critical fields, but it would take on a very unusual nature with its type system. The type system would most likely be designed to interact with containers with a non-scalar mindset.

You might be able to also design something like this in C++ but it would tend to require a new set of containers which double over as both allocator and data structure where we can request to instantiate five elements of T, but in a way that yields a new aggregate, M, e.g., with associated metadata which lets you access that small contiguous block as though it were a separate aggregate of T's on its own.

It also leaves a question as to the homogeneity and alignment requirements of the data, because M != m T's, e.g. It would have that metadata associated which stores things like size potentially and possibly other data associated to the m T's, so it leaves the question of where to store and align that within the container (or possibly outside in parallel, but outside may hinder performance and would imply a pointer/index overhead into the container's contents once more rather than relative addressing), along with construction/destruction concerns for M which are separate from m T's.

Another thing is that if M is allowed to resize m while residing within the container, now there's a whole new and even more complex dynamic going on and solutions like vector would become ill-suited to the problem since they'd require linear-time complexity to resize a single M within it. M as an aggregate would also have to avoid storing pointers to the vector either way, as such pointers would be invalidated whenever the vector's capacity is changed, whenever an element is removed from anywhere other than the back, etc. Basically this design, to be optimal, needs the idea of container and sub-container tightly coupled together in ways that would be incredibly difficult to do in a generalized way through a new kind of type system.

Licensed under: CC-BY-SA with attribution
scroll top