Storing objects whose size are run-time dependent contiguously in memory

https://softwareengineering.stackexchange.com/questions/347341

09-01-2021
|

Question

The background

I am working on an ECS in C++ for fun and I am trying to make it as efficient as possible. One of the optimisations I am striving to implement is to minimise cache misses by storing the components contiguously in the memory.

Since I want to expose the interface to Lua, I cannot use all the compile time goodness C++ offers. The tough part about it is that I don't know the size of the objects at the compile time. As I understand it, I can't just use std::vector<BaseComponent> and not experience object slicing.

TL;DR - The question

How do I store objects, whose sizes are known only at runtime, contiguously in the memory? The sizes of the objects in one container are identical.

Solution

How do I store objects, whose sizes are known only at runtime, contiguously in the memory? The sizes of the objects in one container are identical.

Allocate large chunks of memory.
Use placement new to initialize objects.

char* mem = new char[OBJECT_COUNT*sizeof(Object)];

// ...

// Keep track of where the next object can be constructed.
Object* obj = new (mem + offset) Object();

If you want the memory allocated to be aligned as per the alignment requirements of Object, you may use:

struct alignas(Object) object_s
{
  char dummy[sizeof(Object)];
};

object_t* mem = new object_s[OBJECT_COUNT]);

// ...

// Keep track of where the next object can be constructed.
Object* obj = new (mem + offset) Object();

OTHER TIPS

If you are looking for a standard approach, I cannot help you, but this is not particularly difficult to roll yourself if all the objects are the same size.

int maxNumberOfObjs = x;
int objectSize = y;
int count = 0;

void* ptr = malloc(maxNumberOfObjs * objectSize); //don't forget to free this or you will get a mem leak

//Add object
MyObj* myObjPtr = new (ptr + count * objectSize) MyObj();
++count;

//Iterate over your objects
int i = 0;
while (i < count)
{
    MyObj* obj = ptr + i * objectSize;
    ++i;
}

Been a while since I have done C++, so you may have to do a few tweaks and casts, but you get the idea I hope

Simplest solution which avoids getting down to x-raying bits and bytes and losing all type safety in the intermediate implementation: don't abstract/mix storage at the individual component, abstract/mix at the container level. Instead of BaseComponent you can use BaseComponents (notice plural) for your abstraction. Of course that means when you insert a concrete component (perhaps through some sort of factory), you need to downcast the relevant abstract container (though this is generally unavoidable with an ECS, especially if you can't use all the compile-time goodies, but it's centralized to one place in your ECS implementation) with the matching type info to insert the concrete component subtype to it by value.

This gets nuanced to the implementation (I can help with that if needed), but that's the fundamental strategy. The easiest way to store mixed types of elements where the elements of the same subtype are stored contiguously is abstract at the level of the container, and your polymorphic container is actually a container of containers.

A very simple (hasty and nasty, but minimalist) example, and this doesn't cover the bells and whistles of associating components to entities, or allowing non-hardcoded runtime type registration of components, but it shows a fundamental way to keep contiguity in your component subtypes:

struct FooComponent
{
    enum {id = 0};
    int x;
    int y;
};

struct BarComponent
{
    enum {id = 1};
    float z;
};

struct Components
{
    virtual ~Components() {}
};

template <class T>
struct ComponentsT: public Components
{
    // All components of a given type are stored contiguously,
    // by value.
    std::vector<T> comps;
};

struct Ecs
{
    Ecs()
    {
        // Insert concrete sequences/containers for Foo and Bar components.
        comps.emplace_back(new ComponentsT<FooComponent>);
        comps.emplace_back(new ComponentsT<BarComponent>);
    }

    template <class T>
    void insert(const T& comp)
    {
        // Fetch appropriate container based on associated type ID to T.
        typedef ComponentsT<T> SubComponents;
        SubComponents* sub_comps = dynamic_cast<SubComponents*>(comps[T::id].get());

        // Insert component of type T.
        sub_comps->comps.push_back(comp);
    }

    template <class T>
    T* get(int idx)
    {
        // Fetch appropriate container based on associated type ID to T.
        typedef ComponentsT<T> SubComponents;
        SubComponents* sub_comps = dynamic_cast<SubComponents*>(comps[T::id].get());

        // Return component of type T at specified index.
        return &sub_comps->comps[idx];
    }

    // A container of containers of different component types.
    std::vector<std::unique_ptr<Components>> comps;
};

At Runtime

Now being able to do things like register new component types on the fly at runtime gets really involved with a full-blown ECS implementation.

But the basic thing to remember is that you do need compile-time implementation to construct and destroy some type, T. So you have to call a function which has that type info to be able to do these things which require such type information.

But of course if you are calling such functions from Lua, you have no such type info. So the general strategy is have Lua use, say, some type ID (could just be an integer, or string, or whatever floats your boat) to indirectly tell the ECS what functions to call. And when you register new types of components, you can give the ECS metadata about how to construct and destroy that particular component type (ex: function pointers or a virtual interface to construct and destroy BarComponent) and associate it to this type ID/key. That metadata could actually store ComponentsT<BarComponent> as part of it along with the function pointers/virtual functions to retrieve and construct and destroy and insert BarComponent instances.

Simple Example of Runtime Registration

All right, out of boredom I included a very minimalist and hasty example of how to do the above as well using string keys (I don't recommend string keys normally in an ECS -- interned strings might be a decent alternative, but they're simple).

#include <vector>
#include <map>
#include <string>
#include <memory>
#include <iostream>

struct FooComponent
{
    int x;
    int y;
};

struct BarComponent
{
    float z;
};

// Abstract container.
struct Components
{
    virtual ~Components() {}

    virtual void insert() = 0;
    virtual void erase(int idx) = 0;
    virtual int size() const = 0;
    virtual void* get(int idx) = 0;
};

// Container subtype.
template <class T>
struct ComponentsT: public Components
{
    virtual void insert() override
    {
        comps.push_back(T());
    }

    virtual void erase(int idx) override
    {
        comps[idx] = comps[comps.size() - 1];
        comps.pop_back();
    }

    virtual int size() const override
    {
        return static_cast<int>(comps.size());
    }

    virtual void* get(int idx) override
    {
        return &comps[idx];
    }

    // Stores all components of type T contiguously by value.
    std::vector<T> comps;
};

struct Ecs
{
    // Registers a new type of component and associates the type to
    // a string key.
    template <class T>
    void register_type(const std::string& key)
    {
        Components* comps_t = new ComponentsT<T>; 
        comps[key] = std::unique_ptr<Components>(comps_t);
    }

    // Fetch the container of components associated to string key
    // or nullptr if no such type is registered.
    Components* fetch_components(const std::string& key)
    {
        auto pos = comps.find(key);
        return pos != comps.end() ? pos->second.get(): nullptr;
    }

    // Associates a container of component subtypes to a string key.
    std::map<std::string, std::unique_ptr<Components>> comps;
};

int main()
{
    using namespace std;

    Ecs ecs;

    // Register FooComponents to a string key.
    ecs.register_type<FooComponent>("FooComponent");

    // Register BarComponents to a string key.
    ecs.register_type<BarComponent>("BarComponent");

    // Fetch contiguous FooComponents container by string key.
    Components* foo_comps = ecs.fetch_components("FooComponent");

    // Insert a FooComponent to scene.
    foo_comps->insert();

    // Retrieve a FooComponent and set its fields.
    FooComponent* foo = static_cast<FooComponent*>(foo_comps->get(0));
    foo->x = 123;
    foo->y = 456;

    // Fetch contiguous BarComponents container by string key.
    Components* bar_comps = ecs.fetch_components("BarComponent");

    // Insert some new BarComponents to scene.
    for (int j=0; j < 3; ++j)
        bar_comps->insert();

    // Output how many FooComponents there are in the scene.
    cout << foo_comps->size() << " FooComponents" << endl;

    // Output how many BarComponents there are in the scene.
    cout << bar_comps->size() << " BarComponents" << endl;
}

How do I store objects, whose sizes are known only at runtime, contiguously in the memory?

This is a trivial question in C.

static void* mem_offset(void* ptr, int offset)
{
    return (char*)ptr + offset;
}

...
// Allocate the memory for 'num' objects given 'size' (could be
// a runtime lvalue)
void* mem = malloc(num*size);

// Access the component at the nth position.
void* component = mem_offset(mem, n*size);

// Do something with `component`, maybe passing it to Lua.

// Free the array of components.
free(mem);

The thing is that if you only have the size of an object at runtime, then you have to largely abandon the richness of the type system offered in C++ and even C. Things break down into bits and bytes at that stage unless you have the compile-time information in advance to know the component type and the type and alignment of its data fields, at which point you would already have the size of the ECS component at compile-time once more.

So you're back to working with data like it's just arrays of bytes if you're going to be reading it and/or modifying when you don't even know what size it is at compile-time. If the type information is available to Lua, then you might just store these bits and bytes opaquely in C++, pass the data to Lua, at which point Lua will have the type info necessary to translate those bits and bytes back into something that resembles an actual data type (ex: a Lua table or Lua userdata with associated table of functions).

Of course if you have partial information about the type at compile-time, like that it inherits a base class or that its first fields have members of a specific type with certain alignment, then you can work with what's guaranteed to be known in advance at compile-time through the type system. As a low-level C example:

struct KnownType
{
    int x;
    int y;
    void (*do_something)(struct UnknownType* obj);
};

struct UnknownType
{
    // Parts of the type known at compile-time.
    struct KnownType known_data;

    // Additional stuff stored below is known only at runtime.
    // This is effectively a variable-length struct.
    char trailing_data[];
};

// Access the nth component and modify its 'x' and 'y' fields
// which we know in advance at compile-time that the component
// will have, even if we don't know anything else about it.
struct UnknownType* unknown_data = mem_offset(mem, n*unknown_type_size);
unknown_data->known_data.x = 123;
unknown_data->known_data.y = 456;

// Do something (polymorphism):
unknown_data->known_data.do_something(unknown_data);

Then you can at least work with the data for the unknown type as though it were a known type and work with those x and y fields, but the trailing data is something that you won't be able to access through the type system in C and C++ (maybe only being able to work with it in a nice way in Lua).

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange