Question

I have a hypothetical C++ code containing these two classes:

  • Master: it's big, does a lot of things and is meant to have only one instance;
  • Slave: quite the opposite. It can also do a lot of things, but it's small, and has many instances.

Every slave needs access to the master, so it is injected through the constructor:

class Slave {
    private:
        // Few small attributes
        Master& master;
    public:
        Slave(Master& master) master(master) { }
        // May have lots of methods...
}

As there are many slaves, each one holding a reference to the master, a lot of memory is wasted in pointers that point to the same thing. I would like to think that the C++ compilers could find a way to optimize that Master& master attribute out of the Slave class, but I don't believe they do it - please, correct me if I'm wrong.

A solution would be to turn the Master class into a Singleton, but that is widely considered an anti-pattern. I think maybe a better approach is to turn the Master& master attribute into a static pointer:

class Slave {
    private:
        // Attributes...
    public:
        static Master* master;
        Slave() { }
        // Methods...
}

It's a pity that the reference needs to be converted to a pointer, but at least this eliminates the memory waste, while preserving the ability to "inject" the master and using a mock for testing.

What do you guys think?

Was it helpful?

Solution

Well you're right about something being wrong. But I highly doubt worrying about memory usage is going to fix it.

Unless you can point to some real world data that shows you have a memory problem at the scale of these references, I wouldn't worry about it. This smacks of premature optimization.

Now that doesn't mean there aren't problems here. The master is forcing the slaves to depend on parts of it that they don't need. That's a violation of the interface segregation principle. If I don't need it then keep it away from me.

The slaves deserve to make their dependencies explicit. That doesn't mean they can only take primitives (see primitive obsession). They could take a parameter object of their own. But those parameter objects should be tailored to the individual needs of the slaves. Not a one size fits all.

Worry about memory after you've freed your slaves.

OTHER TIPS

The only time I see the pointer reference (Master* ref) being a problem is when you are in a constrained environment such as embedded programming. In that case, it would be better to use a standard Singleton with a static accessor to provide the Master instance to all your Slaves. Then you only have one pointer to worry about.

I would further argue that Dependency Injection as a pattern would probably not be the right tool for an embedded environment.

Another avenue where dependency injection would be the wrong solution is if Master stored no state and was simply a container of a number of pure functions. Probably not a common thing with C++ since you can use standard functions without having to put them inside a class, but it's common enough in C# and Java due to the language design.

Dependency Injection became popular with server side code, and was even extended to work with desktop applications. Typically the cost of storing a reference (or pointer) in a class is trivial in comparison to the amount of memory the whole application is using.

For the sake of argument, let's say that Master stores the state for your application. If that is the case, then the following will most likely hold true:

  • The amount of memory controlled by Master will most likely be larger than the cumulative number of bytes from all the referencs in your Slave objects.
  • The amount of memory controlled by a Slave will probably be larger than your pointer to Master.
  • The total amount of memory used by your application will most likely be well within an acceptable amount.
  • You have a solution that allows easier testing of the relationship between Master and Slave. That's one of the wins from using DI.
  • By explicitly passing the Master to Slave objects you control access to your singleton a bit better, providing better isolation.

At some point you have to look at your use of any pattern as objectively as you can and decide if the trade-off you make to use the pattern is worth it. There are times where dependency injection is not worth the overhead. There are times when the overhead is worth it. Only you can answer if it is the right tool for your application. If you do, embrace both the good and the bad. This isn't a case where you can have your cake and eat it too.

I work in very performance-critical fields so my thoughts might be skewed a bit in favor of compromising some maintainability in exchange for performance which might not be a worthwhile exchange for everybody. In my case, neglecting efficient interface designs (not efficient implementations, the implementations can start off inefficient as long as we don't have to change the design to optimize) often incurs a lot of maintenance cost because there becomes a strong need for redesigning interfaces and dealing with cascading breakages if the designs didn't leave enough breathing room for optimization behind the hood, which they won't if the objects are modeled at too granular of a level. Speed is a massive quality metric when you work in areas like raytracing. A slow but correct raytracer is almost as worthless to the user as a fast but completely broken one.

Anyway, if dependency injection into objects is too expensive, then the way I look at it is that the objects themselves are too granularly modeled.

Model Your Designs at a Coarser Level

For performance-critical areas, I often find the need to model objects and interfaces at a coarser level, at the level of an Image, not a Pixel, at the level of a ParticleSystem, not a Particle, at the level of Monsters, not a Monster. So if storing pointers to your Master or whatever you're trying to inject is too expensive for your individual Slave (which would imply that there are either millions or more slaves or that your slave needs to be as small as possible to fit as many as you can in a single cache line), then the way I look at the problem is not that the fault lies in DI, but that the Slave is modeled at too granular of a level.

It might instead be worth turning it into Slaves which models a basic collection of slaves.

  • Just vector<Slave>, for example -- cheapest and most straightforward data structure which incurs practically no additional memory cost per Slave (very close to zero bytes), especially if the vector is compacted. You can then use other data structures like sets and maps elsewhere if you need more sophisticated data structures for specific operations relating to slaves.

The cost of a pointer for DI should then be completely trivialized when there's only the overhead of a single pointer for dozens of slaves or even hundreds or thousands or millions. Further when your interface works on Slaves or a range of Slaves and not a single Slave at a time (though that might be additionally provided for convenience for non-critical paths of execution in your Slaves interface to apply simple scalar operations to a specific slave), you have so much breathing room to optimize functions that relate to your slaves.

You can even start coming up with data representations to efficiently store many slaves in ways that would have been impossible if you created each Slave as a separate object, such as a hybrid AoSoA representation of the fields of data that constitute slaves for optimal vectorization and cache hits for both random access and sequential processing cases that can plow through millions of slaves in the blink of an eye across multiple threads using vectorized code with hot/cold field splitting to avoid having to stride over data fields of a slave that aren't accessed frequently in critical paths of execution. And you can apply all of these optimizations to your heart's content without changing the design or breaking any external dependencies when you model at the coarse level of a Slaves object and not a scalar Slave object.

class Slaves
{
public:
    // Public interface. Does everything you could formerly do
    // with slaves individually, but now in bulk and optionally
    // to multiple slaves at once.

private:
    // All the privates can be changed to our heart's content
    // without breaking the public interface. We would not be
    // able to explore such data representations to aggregate
    // Slave data if we had a 'Slave' object instead.
    struct SlaveData
    {
        // AoSoA
        float16 x4[4];
        float16 y4[4];
    };

    // Hot fields of all slaves, accessed frequently in critical
    // paths. Uses AoSoA for rapid sequential and reasonable
    // random access. Each entry holds 4 slaves worth of data.
    vector_aligned64<SlaveData> data; 

    // Names of all slaves -- cold field hoisted away from the
    // critical data and stored in parallel since it is not accessed by
    // performance-critical execution paths. This avoids slowing down
    // the critical paths with additional cache misses and page faults
    // in sequential loops through the hot fields above.
    vector<string> slave_names;

    // Pointer for DI.
    Master* master;
};

The above example is kind of extreme and not a suggestion for a first draft which shouldn't be concerned that much with the efficiency of implementation details until you measure. It just shows how you have the breathing room with this design to optimize in ways that you could not possibly do if you had an individual Slave object and interface instead of a Slaves aggregate interface.

If you need polymorphism with such a design, then you actually create polymorphic aggregates instead of polymorphic scalars. For example, instead of a Dog inheriting a Mammal, you'd have Dogs inheriting Mammals and the collection of Dogs can then be treated like a collection of Mammals through the Mammals interface. As you can probably already tell, that also has the benefits of trivializing dynamic dispatch to the point where it'll probably never become a concern even in the most performance-critical systems imaginable. Tricks game programmers at least used to use like sorting polymorphic containers by subtype to minimize vtable cache misses are completely unnecessary with this method, since each base pointer is actually pointing to an aggregate and each virtual method is already invoking bulky logic which applies to multiple elements at once (with no dynamic dispatch at the individual element level). Thread locking is also simplified since you have bulky aggregate operations coarse enough to appropriately lock without running the risk of locking too frequently at way too granular of a level.

Maintenance Costs

This type of design does have its maintenance costs which multiply with the number of clients using the interface. The clients may now have to start thinking about how to express what they're doing to multiple slaves at once as often as possible because the scalar operations which just apply to one slave at a time may now be even slower than what you had when you had a scalar Slave object. They might have to construct lists of index pair ranges in advance to pass to a bulky aggregate operation indicating what sub-ranges of slaves to transform in parallel, requiring a secondary data structure to be managed by the client just to indicate what slaves they're interested in. Usage of the interface will always be somewhat unwieldy compared to what we could achieve with a single Slave interface. It's similar to the awkwardness of working with vertex buffer objects in OpenGL -- while they are so much more efficient to do things with vertices in bulk, using them also comes with the cost where you can no longer just loop through a list of vertices and render them one at a time individually in immediate mode in the client code in such a convenient way. If there are a lot of non-critical cases where you want more convenience, you could create like a SlaveProxy class which provides that scalar interface but without storing anything more than an index and pointer to the Slaves aggregate and invokes non-scalar operations on it with an index range of just [n, n+1) to process the nth slave.

Design With Endless Breathing Room for Future Optimizations

So if you have a genuine performance concern here that is based on experience (either profiling metrics or well enough knowledge in advance of how often your slaves will be processed and how many will typically be stored), for which I'll give you the benefit of the doubt, then the most useful optimization strategy which will address all your concerns at the design level is to design a Slaves object and demolish the scalar Slave object. Make this aggregate interface responsible for doing everything to slaves with operations that apply to multiple slaves at once and, only for convenience for non-critical paths, scalar operations which apply to individual slaves. With this you should never find that you're trapped into a design where anything you store for a single slave or do with one is impossible to optimize much further without changing the existing design and interfaces of your code and potentially having to rewrite huge sections of your codebase.

Licensed under: CC-BY-SA with attribution
scroll top