Should types have methods in data oriented design?

Question

First, you don't necessarily need to apply data-oriented design everywhere. It's ultimately an optimization, and even a performance-critical codebase still has a whole lot of parts which don't benefit from it.

I tend to often think of it as obliterating structure in favor of big blocks of data that's more efficient to process. Take an image, for example. To efficiently represent its pixels generally requires storing a simple array of numeric values, not, say, a collection of user-defined abstract pixel objects which have a virtual pointer as an exaggerated example.

Imagine a 4-component (RGBA) 32-bit image using floats but using only 8-bit alpha for whatever reason (sorry, it's kind of a goofy example). If we even used a basic struct for a pixel type, we would normally end up requiring considerably more memory using a pixel struct due to structure padding required for alignment.

struct Image
{
    struct Pixel
    {
        float r;
        float g;
        float b;
        unsigned char alpha;
        // some padding (3 bytes, e.g., assuming 32-bit alignment
        // for floats and 8-bit alignment for unsigned char)
    };
    vector<Pixel> Pixels;
};

Even in this simple case, turning it into a flat array of floats with a parallel array of 8-bit alphas reduces the memory size and potentially improves sequential access speed as a result.

struct Image
{
    vector<float> rgb;
    vector<unsigned char> alpha;
};

... and that's how we should be thinking initially: about data, memory layouts. Of course, images are already typically represented efficiently, and image processing algorithms are already implemented to process a large number of pixels in bulk.

Yet data-oriented design takes this to a further level than usual by applying this kind of representation even to things that are considerably higher-level than a pixel. In a similar way, you might benefit from modeling a ParticleSystem instead of a single Particle to leave such breathing room for optimizations, or even People instead of Person.

But let's come back to the image example. This would tend to imply a lack of DOD:

struct Image
{
    struct Pixel
    {
        // Adjust the brightness of this pixel.
        void adjust_brightness(float amount);

        float r;
        float g;
        float b;
    };
    vector<Pixel> Pixels;
};

The problem with this adjust_brightness method is that it is designed, from an interface standpoint, to work on a single pixel. This can make it difficult to apply optimizations and algorithms which benefit from having access to multiple pixels at once. Meanwhile, something like this:

struct Image
{
    vector<float> rgb;
};
void adjust_brightness(Image& img, float amount);

... can be written in a way that benefits from accessing multiple pixels at once. We might even represent it like this with an SoA rep:

struct Image
{
    vector<float> r;
    vector<float> g;
    vector<float> b;
};

... which might be optimal if your hotspots relate to sequential processing. The details don't matter so much. To me what's important is that your design leaves breathing room to optimize. The value to me of DOD is actually how putting that type of thought upfront will give you these types of interface designs which leave you breathing room to optimize later as needed without intrusive design changes.

Polymorphism

The classic example of polymorphism tends to also focus on that granular one-thing-at-a-time mindset, like Dog inherits Mammal. In games that can sometimes lead to bottlenecks where the developers start having to fight against the type system, sorting polymorphic base pointers by subtype to improve temporary locality on the vtable, trying to make data a particular subtype (Dog, e.g.) contiguously allocated with custom allocators to improve spatial locality on each subtype instance, etc.

None of these burdens need be there if we model at a coarser level. You can have Dogs inheriting abstract Mammals. Now the cost of virtual dispatch is reduced to once per type of mammal, not once per mammal, and all mammals of a particular type can be represented efficiently and contiguously.

You can still get all fancy and utilize OOP and polymorphism with a DOD mindset. The trick is to make sure you are designing things at a coarse enough level so that you aren't trying to fight against the type system and work around the data types to regain control over things like memory layouts. You won't have to bother with any of that if you design things at a coarse enough level.

Interface Design

There is still interface design involved with DOD at least as far as I see it, and you can have methods in your classes. It's still very important to design proper high-level interfaces, and you can still use virtual functions and templates and get very abstract. The practical difference I'd focus on is that you design aggregate interfaces, as in the case of the adjust_brightness method above, which leave you the breathing room to optimize without cascading design changes throughout your codebase. We design an interface to process multiple pixels of an entire image instead of one that processes a single pixel at a time.

DOD interface designs are often designed to process in bulk, and typically in a way that has an optimal memory layout for the most performance-critical, linear complexity sequential loops that have to access everything.

So if we take your example with Model, what's missing is the aggregate side of the interface.

struct Models {
    // Methods to process models in bulk can go here.

    struct Model {
        // vertex buffers
        GLuint Positions, Normals, Texcoords, Elements;
        // textures
        GLuint Diffuse, Normal, Specular;
        // further material properties
        GLfloat Shininess;
    };

    std::vector<Model> models;
};

This doesn't strictly have to be represented using a class with methods. It could be a function which accepts an array of structs. These details don't really matter so much, what matters is that the interface is mostly designed to process sequentially in bulk, while the data representation is designed optimally for that case.

Hot/Cold Splitting

Looking at your Person class, you might still be thinking somewhat in a classical interface kind of way (even though the interface here is just data). Again, DOD would primarily use a struct for a whole thing only if that was the optimal memory configuration for the most performance-critical loops. It's not about logical organization for humans, it's about data organization for machines.

struct Person {
    Person() : Walking(false), Jumping(false) {}
    float Height, Mass;
    bool Walking, Jumping;
};

First let's put this in context:

struct People {
    struct Person {
        Person() : Walking(false), Jumping(false) {}
        float Height, Mass;
        bool Walking, Jumping;
     };
};

In this case, are all the fields often accessed together? Let's say, hypothetically, that the answer is no. These Walking and Jumping fields are accessed only sometimes (cold), while Height and Mass are accessed all the time repeatedly (hot). In this case, a potentially more optimal representation might be:

struct People {
    vector<float> HeightMass;
    vector<bool> WalkingJumping;
};

Of course you can make two separate structs here, have one point to the other, etc. The key is that you design this ultimately from a memory layout/performance standpoint, and ideally with a good profiler in your hand and a solid understanding of the common user-end code paths.

From an interface standpoint, you design the interface with a focus towards processing people, not a person.

The Problem

With that out of the way, on to your problem:

I can only create models from this module, not from others. Should I move this to the type class Model while complexifying it?

This is more of a subsystem design kind of concern. Since your Model rep is all about OpenGL data, it should probably belong in the module that can proper initialize/destroy/render it. It might even be a private/hidden implementation detail of this module, at which point you apply a DOD mindset within the implementation of the module.

The interface available to the outside world to add models, destroy models, render them, etc. should ultimately be designed for bulk, however. Think of it as designing a high-level interface for a container where the methods you would be tempted to add for each element instead end up belonging to the container, as in our image example above with adjust_brightness.

Complex initialization/destruction often needs a one-at-a-time design mentality, but the key is that you do this through an aggregate interface. Here you might still forego the standard constructor and destructor for a Model in favor of initializing on adding a GPU Model to render, cleaning up the GPU resources on removing it from the list. It's somewhat back to C-style coding for the individual type (person, e.g.), though you can still get very sophisticated with C++ goodies for the aggregate interface (people, e.g.).

My question is, should I add methods to my type classes?

Mainly design for bulk, and you should be on your way. In the examples you showed, typically no. It doesn't have to be the hardest rule but your types are modeling individual things, and to leave room for DOD often requires zooming out and designing interfaces which deal with many things.

Should types have methods in data oriented design?

Current design

Problem with it