How do you do SOLID with Data Oriented Design?

https://softwareengineering.stackexchange.com/questions/404309

06-03-2021
|

Question

As far as I know Data Oriented Design differs a lot with OOP. It encourages reusability of data, discourages polymorphism, etc. And because SOLID uses OOP a lot (especially Interface Segregation because of obvious reasons), how do you do it with DOD? Using functional paradigm, maybe? But how? Or if you don’t / can’t use SOLID with DOD, what’s the common practice to do clean code in DOD?

Thanks in advance

Solution

Robert Harvey's answer is fully correct, but let me focus a little bit more in-depth on the principles which you can still apply in DoD.

First of all, DoD is a design technique aiming for performance. It is effectively an optimization technique one applies to the parts of a system which would otherwise become a bottleneck. But in most programs of reasonable size, these parts tend to be only 10 to 20% of the whole system. So often most parts of a program can be written in a fully OOP style, following SOLID, and the ones which require high throughput can be designed using DoD techniques. These parts can be encapsulated behind a proper API (which can be seen as a OOP technique).

Second, the SRP can still be applied. For example, when designing a data pipeline with several steps, each step or module of the pipeline will ideally have one responsibility. DoD will just create a different distribution of reponsibilities than "classic OOP" would do. As Robert already mentioned, other "clean code" techniques, especially

the DRY principle
the YAGNI and KISS principles
the Single Level of Abstraction principle
creating testable code and automated tests
chosing well names, creating sufficient documentation
doing code reviews, use of version control, regular refactorings and other supplemental tasks for cleaning up the code

can be applied quite universally, if you are following OO, DoD, or a different paradigm.

However, the ISP, the LSP and DIP have not much room in DoD, since they require inheritance and polymorphism, which are at odds with DoD.

To the OCP: introducing certain kinds of new requirements like new attributes to a DoD pipeline may result in changes to several of the existing pipeline modules, so this can indeed be contrary to the OCP. On the other hand, pipeline processing bears a certain potential to create individual, reusable modules which might be "plugged together" in new ways, without changing the internals. So I guess one has to decide for the individual case where it is possible to follow the OCP.

OTHER TIPS

SOLID is object-oriented, by definition. You can't do SOLID without objects.
You don't need SOLID to have "clean code," unless you're following the letter of Bob Martin's principles.
Principles are just that; principles. They are not laws, mandates or decrees. Ergo, you are not required to follow them. Principles are just there to inform your software design decisions, not make them for you.
Principles can, and do, conflict. So you have to choose the principles you're going to follow based on the desired objectives.
DOD's objective is to provide good data locality in order to obtain the best possible performance. Its fundamental premise is that "standard" object-oriented techniques are poor at providing this locality, and therefore are incompatible with DOD.

From my standpoint, it's just a matter of the granularity vs. coarseness of where you say "implementation details end here" and "abstract interfaces start there".

Software that performs real-time simulations of particles might process a boatload of particles every single frame, so I would suggest if the frame rates matter that you turn the particle into an "implementation detail". Don't model it as a full-blown object complete with ISP and DIP and LSP and so forth. Give yourself the breathing room to optimize (i.e., change) the representations of individual particles for optimal data locality and SIMD and multithreading and so forth without intrusive changes to your codebase.

But that doesn't mean you have to haggle for pennies over a million-dollar purchase by avoiding SOLID in the entire ParticleSystem which might be a collection that emits hundreds of thousands to millions of particles. Objects have a lot of useful properties like the ability to maintain invariants over their mutable state, free their memory automatically when they go out of scope or when nothing else references them, and it's recognizing in your domain where they aid you vs. where they get in the way that I think is going to be the most productive route for many. Yet bundling data and functionality together based on human intuition rather than actual access patterns can be counter-productive, not just for performance but also from increased coupling, in those critical granular cases: a single pixel of an image, a single particle of a particle system, or maybe even a single soldier in an army.

You find the balance that's suitable for your domain. It doesn't mean we have to abandon OOP outright unless you're some conceptual purist. Everyone's mileage might vary here, but I actually found a data-oriented mindset helpful to me in designing (coarser) objects even in terms of basic things like maintaining invariants. It's like before I was too obsessed with maintaining invariants over a grain of sand forming a sandcastle when that's often redundant with unit tests. Some degree of DOD helped me to shift that focus to the integrity of the sandcastle, and making sure its walls don't crumble at the slightest gust of wind. A focus on the hardware and access patterns forced me to design interfaces with more breathing room for changes (not just in response to performance requirements, but unanticipated design requirements as well).

Granular Abstractions

There's a problem I've always head-butted in the granular design of teeny objects with complex interactions with others, and if it was only my code exhibiting such symptoms, then I'd just conclude that I'm a horrible object-oriented designer. But I see it all around me. The big problem I see is that the desire to encapsulate and hide data and maintain invariants over it seems to break down when objects want very complex interactions with each other.

Take a video game which has fancy rules like a spell that freezes an enemy creature and puts it under the player's control while randomly sending off shards across the screen that also freeze and place enemies under the player's control that intersect it, and any frozen/player-controlled creature of this sort repeats the pattern along with dealing cold damage to anyone who gets nearby. Or an amphibian which can breathe underwater and on land, but takes penalties to abilities like movement and health regeneration while on land unless it's raining. Add hundreds to thousands of more rules of this sort as we might find in an AD&D rulebook, and some only anticipated in future versions of the software.

What tends to happen in my experience is that the abstractions we try to build if we're building at the granular level of a single spell, or creature, or weapon, start becoming increasingly monolithic, leaky (ex: some horrid getter/setter interface for practically every member variable), or both, and a temptation to downcast to concrete if there are polymorphic pointers/references in the mix. They start wanting to become like plain old C structs -- DTOs -- given such complex interactions between them. And from my standpoint, if I try to reconcile it with OO and SOLID, the problem from my best attempts at diagnosis is that we tried to hide data at too granular of a level with questionable benefits in maintaining micro-invariants, not meaty macro-invariants, over vital, hidden data we're jumping through hoops to access in some abstract way while doing backflips trying to design, and re-design, and re-re-design, our interfaces. So I see it like there's an appropriate level/threshold at which point we can start hiding data without incurring heavy productivity and performance costs, and an inappropriate level where we will pay dearly. And it's all about finding that sweet spot as I see it. But I tend to think that sweet spot is coarser than most of us might think for all but the leakiest abstractions (like generic containers) and isn't going to emphasize hiding trivial amounts of data, at least in the types of domains I tackle.

So I've tried to reconcile OO and SOLID principles in spite of being in an industry that's moving away from it, and what I find is just increased coarseness in my OO designs: Army, not Solider, ParticleSystem, not Particle, Image, not Pixel, and definitely SandCastle, not SandGrain. Well, I've also gone quite functional, since I still like the ability to maintain invariants, like no-brainer thread-safety, and we can still favor open data architectures while being able to maintain invariants and reason about them if we're working a lot with immutable objects and data. The bundling/encapsulation of data and logic cease to become such a necessity if we can't mutate the underlying data post-initialization/construction. Then I favor OO and SOLID for the broader, coarser cases where mutable internal state is unavoidable. Also find the general concepts behind the actor model useful in some cases (some might argue that this is the truest form of OOP).

The Universality of Data

Now, this is probably a statement that will be highly contested, but I've found that programming friendlier to the hardware with consciousness towards data formats/reps actually seems to result in easier-to-read-and-change code in my domain over people enforcing their world views on a codebase. It might be because our personalities and perspectives are so distinct and inevitably clash, and we lack standards beyond some shallow aspects that our companies or languages impose to try to get us somewhat on the same page. Such standards focused on coding over data never go far enough in my opinion. The hardware and data are on the same page. And of course, I'm not claiming that handwritten SIMD intrinsics or assembly code has such qualities. There's always too much of anything. But I've found that just embracing the hardware, and the data (and its format/representation) more, as the means to get us on the same page, can do so far more effectively than standards on how to write a loop or how to design interfaces or something of this nature.

When we make it about data and hardware, we can be largely on the same page even if we use completely different tools and languages to transform it, and even if we're polar opposites in personalities. It's the ultimate form of modularity: you might have your very personal way of doing things, and I have mine, but we can be on the same page with respect to what data you need to input and what data you output back to me without all sorts of adapters and layers to translate the data between us. So I've always found DOD helpful as the more guiding factor behind the design to get everyone on the same page, and something like SOLID as a tool to help implement certain designs (typically very coarse/high-level ones in my case) with less effort. DOD is about far more than performance from my perspective. It's about getting back to the roots of the problem at hand, and bringing us back to the same page.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange