Legitimate “real work” in a constructor?

https://softwareengineering.stackexchange.com/questions/305464

10-12-2020
|

Question

I am working on a design, but keep hitting a roadblock. I have a particular class (ModelDef) that is essentially the owner of a complex node tree built by parsing an XML schema (think DOM). I want to follow good design principles (SOLID), and ensure that the resulting system is easily testable. I have every intention of using DI to pass dependencies into the constructor of ModelDef (so that these can easily be swapped out, if need be, during testing).

What I'm struggling with, though, is the creation of the node tree. This tree is going to be made up entirely of simple "value" objects which will not need to be independently tested. (However, I may still pass an Abstract Factory into ModelDef to assist with the creation of these objects.)

But I keep reading that a constructor should not do any real work (e.g. Flaw: Constructor does Real Work). This makes perfect sense to me if "real work" means constructing heavy-weigh dependent objects that one might later want to stub out for testing. (Those should be passed in via DI.)

But what about light-weight value objects such as this node tree? The tree has to be created somewhere, right? Why not via the constructor of ModelDef (using, say, a buildNodeTree() method)?

I don't really want to create the node tree outside of ModelDef and then pass it in (via constructor DI), because creating the node tree by parsing the schema requires a significant amount of complex code -- code that needs to be thoroughly tested. I don't want to relegate it to "glue" code (which should be relatively trivial, and will likely not be directly tested).

I have thought of putting the code to create the node tree in a separate "builder" object, but hesitate to call it a "builder", because it doesn't really match the Builder Pattern (which seem to be more concerned with eliminating telescoping constructors). But even if I called it something different (e.g. NodeTreeConstructor), it still feels like a bit of a hack just to avoid having the ModelDef constructor build the node tree. It has to be built somewhere; why not in the object that's going to own it?

Solution

And, besides what Ross Patterson suggested, consider this position which is the exact opposite:

Take maxims such as "Thou Shalt Not Do Any Real Work In Thy Constructors" with a grain of salt.
A constructor is, really, nothing but a static method. So, structurally, there is really not much difference between:

a) a simple constructor and a bunch of complex static factory methods, and

b) a simple constructor and a bunch of more complex constructors.

A considerable part of the negative sentiment towards doing any real work in constructors comes from a certain period of the history of C++ when there was debate as to precisely what state the object will be left in if an exception is thrown within the constructor, and whether the destructor should be invoked in such an event. That part of the history of C++ is over, and the issue has been settled, while in languages like Java there never was any issue of this kind to begin with.

My opinion is that if you simply avoid using new in the constructor, (as your intention to employ Dependency Injection indicates,) you should be fine. I laugh at statements like "conditional or looping logic in a constructor is a warning sign of a flaw".

Besides all that, personally, I would take the XML parsing logic out of the constructor, not because it is evil to have complex logic in a constructor, but because it is good to follow the "separation of concerns" principle. So, I would move the XML parsing logic into some separate class altogether, not into some static methods that belong to your ModelDef class.

Amendment

I suppose that if you have a method outside of ModelDef which creates a ModelDef from XML, you will need to instantiate some dynamic temporary tree data structure, populate it by parsing your XML, and then create your new ModelDef passing that structure as a constructor parameter. So, that could perhaps be thought of as an application of the "Builder" pattern. There is a very close analogy between what you want to do and the String & StringBuilder pair. However, I have found this Q&A which seems to disagree, for reasons which are not clear to me: Stackoverflow - StringBuilder and Builder Pattern. So, to avoid a lengthy debate over here as to whether the StringBuilder does or does not implement the "builder" pattern, I would say feel free to be inspired by how StrungBuilder works in coming up with a solution that suits your needs, and postpone calling it an application of the "Builder" pattern until that little detail has been settled.

See this brand new question: Programmers SE: Is “StringBuilder” an application of the Builder Design Pattern?

OTHER TIPS

You already give the best reasons to not do this work in the ModelDef constructor:

There's nothing "lightweight" about parsing an XML document into a node tree.
There's nothing obvious about a ModelDef that says it can only be created from an XML document.

It sounds like your class should have a variety of static methods like ModelDef.FromXmlString(string xmlDocument), ModelDef.FromXmlDoc(XmlDoc parsedNodeTree), etc.

I've heard that "rule" before. In my experience it is both true and false.

In more "classic" object orientation we talk about objects encapsulating state and behaviour. Thus an objects constructor should ensure that the object is initialized to a valid state (and signal a fault if the provided arguments does not make the object valid). Ensuring that an object is initialized to a valid state sure sounds like real work to me. And this idea has merits, if you have an object that only allows initialization to a valid state via the constructor and the object properly encapsulates it state so that each method that changes the state also checks that it doesn't change the state to something bad...then that object in essence guarantees that it is "always valid". That is a really nice property!

So the problem generally arrives when we try to break everything apart into small pieces to test and mock stuff. Because if an object is really properly encapsulated then you can't really get in there and replace the FooBarService with your mocked FooBarService and you (probably) can't just change values willy-nilly to suit your tests (or, it may take a lot more code than a simple assignment).

Thus we get the other "school of thought", which is SOLID. And in this school of thought it is most likely a lot more true that we shouldn't do real work in the constructor. SOLID code is often (but not always) more easy to test. But can also be harder to reason about. We break apart our code into small objects with a single responsibility, and thus most of our objects no longer encapsulate their state (and contain generally either state or behaviour). Validation code is generally extracted into a validator class and kept separate from the state. But now we've lost cohesion, we can no longer be sure that our objects are valid when we get them and to be completely sure we must always validate that the preconditions that we think that we have about the object is true before we attempt to do something with the object. (Of course, in general you do validation in one layer and then assume that the object is valid at lower layers.) But it's easier to test!

So who is right?

No one really. Both schools of thought have their merits. Currently SOLID is all the rage and everyone is talking about SRP and Open/Closed and all of that jazz. But just because something is popular doesn't mean that it is the correct design choice for every single application. So it depends. If you're working in a code base that is heavily following the SOLID principles then yes, real work in the constructor is probably a bad idea. But otherwise, look at the situation and try to use your judgement. What properties does your object gain from doing work in the constructor, what properties does it lose? How well does it fit with the overall architecture of your application?

Real work in the constructor isn't an antipattern, it can be quite the opposite when used in the correct places. But it should be documented clearly (along with which exceptions can be thrown, if any) and as with any design decision - it should fit within the general style used in the current code base.

There is a fundamental problem with this rule and it's this, what constitutes "real work"?

You can see from the original article posted in the question that the author attempts to define what "real work" is, but it's severely flawed. For a practice to be good it needs to be a well defined principle. By that I mean that as respects to software engineering the idea should be portable (agnostic to any language), tested, and proven. Most of what is argued in that article doesn't fit into that first criteria. Here are some indicators the author mentions in that article of what constitutes "real work" and why they are not bad definitions.

Use of the new keyword. That definition is fundamentally flawed because it's domain specific. Some languages don't use the new keyword. Ultimately what he's hinting at is that it should not be constructing other objects. However in many languages even the most basic values are themselves objects. So any value assigned in the constructor is also constructing a new object. That makes this idea limited to certain languages, and a bad indicator as to what constitutes "real work".

Object not fully initialized after the constructor finishes. This is a good rule, but it also contradicts several of the other rules mentioned in that article. A good example of how that could contradict the others is mentioned the question that brought me here. In that question someone is concerned about using the sort method in a constructor in what appears to be JavaScript because of this principle. In this example the person was creating an object that represented a sorted list of other objects. For purpose of discussion imagine that we had a unsorted list of objects and we needed a new object to represent a sorted list. We need this new object because some part of our software is expecting a sorted a list, and lets call this object SortedList. This new object accepts an unsorted list and the resulting object should represent a now sorted list of objects. If we were to follow the other rules mentioned in that doc, namely no static method calls, no control flow structures, nothing more than assignment, then the resulting object would not be constructed in a valid state breaking the other rule of it being fully initialized after the constructor finishes. To fix this we would need to do some basic work to make the unsorted list sorted in the constructor. Doing this would break the 3 other rules, making the other rules irrelevant.

Ultimately this rule of not doing "real work" in a constructor is ill defined and flawed. Trying to define what "real work" is an exercise in futility. The best rule in that article is that when a constructor finishes it should be fully initialized. There are a plethora of other best practices that would limit how much work in done in a constructor. Most of these can be summed up in SOLID principles, and those same principles would not prevent you from doing work in the constructor.

PS. I feel obligated to say that while I assert here that there is nothing wrong with doing some work in the constructor, it's not the place to be doing a bunch of work either. SRP would suggest that a constructor should do just enough work to make it valid. If your constructor has too many lines of code (highly subjective I know) then it's probably violating this principle, and could probably be broken into smaller better defined methods and objects.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange