How do you actually access the data of individual components in an Entity-Component-System design? (C++)

https://softwareengineering.stackexchange.com/questions/380556

15-02-2021
|

Question

I've been scouring information on Entity-Component-System designs for weeks to try to figure out how to implement it in C++, and there are lots of wonderful explanations for different aspects of it, but the one thing everybody seems to overlook in examples is how you actually access whatever the data members are in a specific derived Component class when all you have is a list of base class pointers in your Entity. I'm still new to programming in general, so if that's something that's "common knowledge," it's not common enough for all levels of skill.

The only way I can think of is to have a virtual function that returns an enum for the type of component and then cast it to that, but that seems like a hack job, and I can't help but feel like there must be a better way.

Edit: This is what I'm trying to accomplish:

//incomplete pseudocode

class DerivedComponent1 : public BaseComponent
{
  int a;
  int b;
}


class DerivedComponent2 : public BaseComponent
{
  string str;
  float c;
  float d;
};


class Entity
{
  vector<BaseComponent*> components;
};


class System
{
  void init()
  {
    entityInstance.components.push_back(derivedComponent1Instance);
    entityInstance.components.push_back(derivedComponent2Instance);
  }

  void doTask()
  {
    //access derived class members somehow
    entityInstance.components[0]->accessDerivedComponent1Members();
    entityInstance.components[1]->accessDerivedComponent2Members();
  }
};

This is in a system for a video game where an Entity can have zero or one of each type of component, like a health amount or a transform, and a System will only act on an entity that owns the specific components it needs to do its task, so behavior is customized by giving the Entity the correct Components for the desired behavior. The only way I can think of to access the component data is to add an enum with the component type and make doTask() something like this:

//incomplete pseudocode

enum ComponentType
{
  Derived1,
  Derived2
};


void System::doTask()
{
  DerivedComponent1* derived1;
  for(auto i : entityInstance.components)
  {
    if(i->getComponentType()==ComponentType::Derived1)
      derived1 = dynamic_cast<DerivedComponent1*>(i);
  }

  .
  .
  .

}

but my understanding is that if your code has to rely on casts, then it's probably bad code. My question is if there's a better way to do it.

Solution

My Way

The way I do it, though this is not for the faint of heart as it involves variable-length structs, reinterpret casts, placement new and manual dtor invocations, and requires some knowledge of proper memory alignment if it is to be compacted as much as possible, is like this:

Each component type is stored in its own contiguous memory block as a VLS. There are no heap/free store allocations/deallocations on a per-component instance level (they are stored contiguously in an array that grows, similar to std::vector, and you can use that here, though it needs to behave like a free list with O(1) removals from the middle without invalidating any other indices).

And a pair of indices (type + index into the container for the type) adding up to 48-bits basically links the entities and components associated to that entity together, with -1 indicating the null terminator.

The real version actually stores the 48-bit link indices in parallel instead of directly into the same container as the component instances as a form of hot/cold field splitting (the links aren't traversed that often so much as traversing components of a particular type where it helps to reduce the stride), but I was too lazy to draw that into the diagram and that's a micro-optimization you can apply later.

And yes, there is casting going on (in my case a reinterpret cast which is even more ugly than a dynamic cast) to retrieve a component of a particular type from an entity when we write like:

MotionComponent* motion = entity.get<MotionComponent>();

But that is only one get method in the codebase that requires such a cast, and there's runtime checking going on to make sure the cast is legal to mitigate a lot of the usual problems associated with casting pointers.

In my use cases it's not uncommon for various systems to process half a million components per frame, so it's rather tuned for a specific use case. It might be a bit overkill for your needs but it's something that has served me for a long time now if you want to roll up your sleeves and get down and dirty with the bits and bytes.

Your Way

The way you have it now is, if you'll forgive me, about the most inefficient way I can imagine to do it for every kind of use case I can imagine. The problem there is that first you're dynamically-allocating every single component instance. That's often going to result in a loss of spatial locality between components. And then storing a separate container per entity which actually has its own size and capacity and pointer and dynamic allocations is also going to be quite a memory and performance overhead if you have a large, large number of entities. Lastly if you aren't storing components of a particular type contiguously in their own container, then that means when your system wants to fetch a list of components of a particular type (probably the most common query to an ECS), the ECS is going to have to iterate through every single entity in the entire scene that exists checking to see if it has that component (which involves potentially iterating through every single component in each entity).

If that's okay though, then one dynamic_cast in a central place is very understandable to implement ECS. Any time you want a syntax like this:

MotionComponent* motion = entity.get<MotionComponent>();

... that pretty much implies that there's some cast going on somewhere under the hood unless all that information is available statically at compile-time (in my case at least that's not feasible since plugin developers can introduce brand new component types into the software at runtime). The dynamic_cast is at least the safest kind of pointer cast there is and, if you centralize it to one get method as part of the entity interface, then it's not that big of a deal in practice.

As for your particular implementation, if you go with it, I'd seek to add something like this:

class Entity
{
public:
    template <class T>
    T* get() const
    {
        for (auto comp: components)
        {
            auto ptr = dynamic_cast<T*>(comp);
            if (ptr) 
                return ptr;
        }
        return nullptr;
    }

private:
    vector<BaseComponent*> components;
};

That centralizes the dynamic_cast to that one get<T> method in the codebase as opposed to having to litter casts and type checks all over your systems*.

Another Way

Another way I've seen people do, though it could easily be explosive in memory if implemented densely, is like this:

Basically you just think of it like an NxM matrix, where M is the number of entities, and N is the number of component types. You might store some marker or just a null pointer in the grid/matrix cells where no component of a particular type is attached to an entity. That's very straightforward but gets explosive in memory as you can probably quickly see.

A way to optimize the memory use then is to turn this into a sparse representation. For example, you might use a hash map for each column above (each hash table stores instances of a particular component type) which is associated to the row (entity) index as key, for example.

If you use this kind of rep and want this kind of syntax:

MotionComponent* motion = entity.get<MotionComponent>();

Then one straightforward way is to do something like this:

struct MotionComponent
{
    // Indicates which column (hash table, e.g.) to use in the 
    // above table where the motion components are stored.
    enum {type_index = ...};
    ...
};

... in which case you'd do something like:

template <class T>
T* Entity::get() const
{
    enum {idx = T::type_index};
    auto it = ecs.comp_maps[idx].find(this->entity_index);
    return (it != ecs.comp_maps[idx].end()) ? dynamic_cast<T*>(*it): nullptr;
}

To avoid the dynamic allocation per component, you can abstract at the hash map level like this as a very crude example:

class ComponentMap
{
public:
    virtual ~ComponentMap() {}
    virtual BaseComponent* find(int n) = 0;
};

template <class ComponentType>
struct ComponentMapT: public ComponentMap
{
    virtual BaseComponent* find(int n) override
    {
         auto it = comps.find(n);
         return (it != comps.end()) ? &*it: nullptr;
    }
    std::hash_map<ComponentType> comps;
};

Then you can store a polymorphic list of ComponentMap which actually stores each component subtype instance by value instead of storing BaseComponent* with dynamic allocations per component instance.

I suspect this solution can be quite performant if you do it this way. It does involve hash lookups to get from a particular entity to a component of a specific type, but typically your most critical execution paths in an ECS involve plowing through all available components in a scene of a particular type without even going through the entities first, like your physics system might want to plow through all the motion components in the system to transform, or your rendering system might want to plow through all the sprites (in 2D) or meshes/model (in 3D) in the system to render (or if a spatial index is involved, some system might want to plow through those components to update the spatial index only for your rendering system to then use the spatial index to render the appropriate components in the screen/frustum). This solution allows that to be done by just sequentially iterating through the hash map (which tends to store elements contiguously) storing all components of a given type without any kind of traversal from entity to an associated component.

OTHER TIPS

The Entity doesn't need to have a vector<Component *>, as you've noticed, that's practically useless.

What you'll instead have is a bunch of vector<FooComponent *>s, for each type of Component.

In C++, you don't even need a class Component, instead your component-type agnostic code can be templates. You'll probably have a FooComponent -> Entity -> BarComponent lookup if any of your components interact with (optional) other components

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange