Modulizing vs Performance benefits

https://softwareengineering.stackexchange.com/questions/380209

14-02-2021
|

سؤال

Modulizing software from my empirical experience seems to decrease performance, most of the overhead can be communication of modules, redundancy in computation, storage and depending on the application can be at a less or greater extend or can be optimized by a compiler.

So I have been thinking of the benefits of e.g. microservices vs larger systems that are harder to maintain.

What are some good criteria to use when choosing between what to keep modular vs what to keep as one system in my design process?

المحلول

A module should isolate change.

The module boundary should act as a firewall against a design change propagating into the rest of the system.

Therefore a module should fully encapsulate a design decision. Changing that decision should only impact that module.

Therefore strive to keep as many of your design decisions isolated in modules as you can. That way when change hits you won't have to rewrite everything.

If speed is more important and you believe change will never hit you then stop solving the problem with software. Use a soldering iron.

نصائح أخرى

What are some good criteria to use when choosing between what to keep modular vs what to keep as one system in my design process?

I don't consider such choices in how to organize systems from a bird's eye view, and I'm working in a fairly performance-critical area (not as tight as embedded) in VFX with path tracers, real-time renderers, fluid dynamics, particle sims, physics, inverse kinematics applied on every character in massive scenes, endless mesh algorithms, interactive sculpting, texture painting, etc.

In spite of all this I have never found the need to compromise modularity or abstractions in favor of performance, but there are sometimes design-level performance concerns which impact designs, but not in a way that sacrifices these qualities.

[...] most of the overhead can be communication of modules, redundancy in computation, storage and depending on the application can be at a less or greater extend or can be optimized by a compiler.

Then reduce the frequency of communication. If paying some extra pennies for a can of soda is relatively expensive, then don't buy one can of soda at a time. Buy like a 24-pack or a million cans of soda at once and a few extra pennies for the purchase is absolutely trivial.

That's the design level concern for me (which shouldn't be conflated with a desire to implement things as efficiently as possible prematurely). It's like don't modularize or abstract an image compositing software intended to plow through pixels in real-time at the granular level of communicating operations to perform on a single pixel. In that case the additional overhead of dynamic dispatch and the optimization barriers that imposes (inability for compiler to inline given its lack of compile-time information, e.g.) is a tremendous overhead, and furthermore the dependencies to the module's pixel interface will paint you into a design corner where the performance bottlenecks that will likely yield cannot be optimized away effectively without reconsidering the entire design of the architecture. Abstract/modularize at the level of an entire image instead (often an aggregate of millions of pixels) and then those costs are absolutely trivialized and without necessarily making the code much harder to write or maintain and without compromising on modularity.

And that's typically what I'm concerned about these days with design, not whether or not to modularize but how granular the module's operations should be. Any function which has sufficiently meaty work to do trivializes the cost of any overhead that would result in communication, so my goal in areas where performance is anticipated to be a strong design-level requirement, to reduce probability of costly design changes in hindsight, is to make sure the functions have sufficiently meaty work to perform and won't be invoked a billion times externally only to do relatively trivial work inside in some tight loop invoking the module. And that's with the goal of leaving sufficient breathing room to optimize its implementation in hindsight, post-measurements, without having to redo the entire design. Crudely speaking it's like put the loopy code inside the module, not outside of it only to call tiny functions in the module a gazillion times from the outside world.

Nuance

This is getting rather nuanced but I can go into some details of the "design for bulk processing" mindset. For example, if you want to design the public interface of a module which operates on elements which satisfy a predicate supplied by the caller of the module, and it seems very performance-critical upfront, then the design of the predicate doesn't have to operate on one element at a time. It can operate on, say, 64 elements at a time passed through to it from the stack and return a 64-bit integer with bits set for the elements which satisfy the predicate. That reduces the calling overhead to 1/64th of what you'd otherwise get. You can get an even further reduction if the predicate operates on a larger array and returns a larger bitset, and the cost of constructing these arrays (provided you store them on the stack or some other memory with high temporal locality) is generally trivial and cheaper than the dynamic dispatch/calling overhead per element.

If you have a massive real-time video game with a Unit concept and subtypes like Human, Orc, Elf, then you don't have to design interfaces and modules that deal with a single Unit at a time. You can design an abstract collection Units interface and have concrete types like Humans, Elves, Orcs, with abstract functions which operate on a whole collection of these at a time. I see game devs formerly having jumped through all sorts of loops like sorting polymorphic base pointers to improve branch prediction on dynamic dispatch and spatial locality and implementing custom allocators to try to get around the bottlenecks they created designing these things at such a granular level, but a much simpler solution which doesn't get devs knee-deep in gory micro-optimizations in response to their hotspots is just design at a coarser aggregate level (which can still decompose the system into the same number of modules as before, but now they have much more meaty work to do in each of their functions).

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى softwareengineering.stackexchange