Avoiding side effects in immutable class constructor

https://softwareengineering.stackexchange.com/questions/395867

28-02-2021
|

Question

I rewrote a very long method in which some data is queried from a database, based on info about a particular account, which is queried first.

I split out the account info into an immutable inner class AccountInfo. This is a vastly simplified and altered version of what its constructor looks like:

private AccountInfo(String accountId, Optional<String> extraInfo) {
    final List<Project> projects;
    if (extraInfo.isPresent()) {
        // (I could probably improve this and remove the .get() call)
        projects = projectsDAO.fetchProjects(accountId, extraInfo.get());
        if (projects.isEmpty()) {
            this.projectIds = new ArrayList<>();
            this.projectNames = new ArrayList<>();
            this.unitIds = new ArrayList<>();
            return;
        }
    } else {
        projects = projectsDAO.fetchProjects(accountId);
    }

    projectIds = projects.stream().map(Project::getProjectId).collect(toList());
    projectNames = projects.stream().map(Project::getProjectName).collect(toList());

    final List<ProjUnit> units = unitsDAO.fetchUnitIds(projectIds);

    this.projectIds = projectIds;
    this.projectNames = projectNames;
    this.unitsIds = units.stream().map(ProjUnit::getUnitId).limit(1000).collect(toList());
}

As you can see, the constructor queries the database multiple times, via the DAOs (in the real code it's four DAOs instead of two). Now, as far as I know, it's bad practice to have effectful computations inside a constructor (please correct me if that's wrong). There's four potential solutions that come to my mind:

1. Make the class mutable

If I didn't insist on doing everything inside the constructor, I could first create an empty instance, and then call some methods to populate it:

accountInfo = new AccountInfo(accountId, extraInfo);
accountInfo.fetchProjects()
accountInfo.fetchUnits()

Problems:

Immutable classes are generally preferred when possible, which seems especially relevant since this is a class that stores information that really shouldn't be modified once queried.
It cannot be guaranteed that the caller calls the fetch methods after calling the constructor.

2. Supply the queried data to the constructor

Instead of querying the database inside the constructor, the caller could supply the data to the constructor via parameters.

Problems:

This would remove some of the advantage gained from separating the class from the original method in the first place, since the point was to take this functionality and organize it as a unit.
The second query depends on results from the first, so either almost all of the logic would have to be done before the constructor is called, or I would need two separate immutable classes, one that gets the queried info about projects and a second one that gets that object and the queried info about units. (Possibly even more in the real code.)

3. Use static factory methods

Instead of inside the constructor, the querying could be done in a static factory method, leaving only the assignment to the actual fields to the constructor.

Problems:

Since static factory methods are very similar in responsibility to constructors, I'm not sure if it's any better for them to have side effects.
The DAO objects are part of the outer class instance, so a static method would have to receive these as parameters to be able to use them, which seems somewhat inelegant. Though it could perhaps be seen as an advantage, since it would make their use explicit.

4. Use something like a builder

This would be somewhat similar to the mutable option but seems strictly better, since it's safer. It would allow to do something like

AccountInfo accountInfo = AccountInfo.Builder(accountId, extraInfo, projectsDAO, unitsDAO).build()

And since the Builder itself would have the wrong type, it would ensure that the caller has to call build(), which would handle all the side-effects.

Problems:

Similar to the static factory method option, it seems like it might be bad practice to have side effects in a build function (though perhaps this could be alleviated simply by renaming Builder and build to something that doesn't evoke the same expectations)
I think this also requires giving the Builder the DAOs as parameters (which, again, could potentially be seen as an advantage)

Solution

Some thoughts:

Now, as far as I know, it's bad practice to have effectful computations inside a constructor

That depends on what you mean by "effectful."

The purpose of a constructor is to "construct" an object. To do that, you may need to do many things. One of the things you may need to do is set up some state inside the new object. Is that what you consider "effectful?"

The whole point of a class is to store some data ("state") and some code that works with that data. We call that "encapsulation." That's kinda the whole point of a class: to shield class state from outside influence and prevent state from influencing the outside, except through well-defined boundaries called "properties" and "methods." This encapsulation is what allows us to work with data freely without having to worry about breaking the rest of the system.

Immutable classes are generally preferred when possible

Immutable classes are preferred when you need the benefits that immutable classes provide.

Instead of querying the database inside the constructor, the caller could supply the data to the constructor via parameters.

Problems:

This would remove some of the advantage gained from separating the class from the original method in the first place, since the point was to take this functionality and organize it as a unit.

Intuitively, this approach makes some sense. One class is responsible for gathering data, the other class is responsible for performing your computations. It makes things easier to test, it makes your logic easier to understand, and it follows the spirit of SRP.

Instead of inside the constructor, the querying could be done in a static factory method, leaving only the assignment to the actual fields to the constructor.

The purpose of factory methods is to provide an alternate way to construct objects that has more flexibility. For example, you can return one object from a family of types that derives from one base type. I don't know enough about your system to know whether or not you need that kind of flexibility.

Use something like a builder

...since the Builder itself would have the wrong type, it would ensure that the caller has to call build(), which would handle all the side-effects.

Making a builder is a very complex undertaking. Only do that if you absolutely need the benefits that a builder provides. I would only use a builder if you want to provide guidance to the person calling the builder in the form of intellisense, provide construction flexibility to the caller, or force a certain order onto the build steps.

Conclusion?

I think you might be worried a bit too much about ceremony and being "correct." If you could ignore the rule that says "do not perform computations in the constructor," would you? If your class didn't have to be immutable, what would you do? If you could take the approach that you liked the best, regardless of notions of "best practices," what approach would you choose?

Focus on the techniques that provide you the best balance of benefits and costs.

OTHER TIPS

"Classes should be immutable unless there's a very good reason to make them mutable."
Joshua Bloch - Effective Java

"Principles should be followed because you understand what you're getting out of following them. Not because you can quote them from some famous guys book."
Some anonymous guy on the internet

Immutable or not, encapsulated or not, a constructor that does too much work or not the real limitation of this design is that the constructor has an opinion about how to get this data. That means if you happen to have exactly that data laying around somewhere else this class is now useless to you.

If there is work that must be done to get this data you don't have to pack the code that does that work in your constructor. You can write a constructor that takes the data. Then you can just pass the data in. Dependency Injection is the fancy term for that but we used to just call it reference passing so don't let that name get you worked up.

The advantage is that now the class can simply be a consumer of this data and not a gatherer of the data. So if that data turns up elsewhere ready to go this class is ready for it. The disadvantage is now you have to think of a place to put the data gathering code. This principle is called separate use from construction.

Now pragmatically speaking, maybe you just don't know where else to put this data getting behavior. Many simply dismiss this issue as you being lazy but it's a real problem. Codebases impose their organizational style on you and you have to find places to put things where people can find them later. So maybe a factory class, builder, DSL, or any other creational pattern is just not practical for you. That's fine but it still doesn't mean you have to force this class to always do this work.

The laziest way (lazy can be a good thing) I know to avoid forcing this work to happen every time is simply make another constructor and call it from the one that knows how to do this data gathering work. Now you can bypass the data gathering if you have another way to gather it. The data gathering is now simply an overridable default behavior.

The cost of doing that is that the class still has to know how to gather what it needs and so has a hard coded dependency on whatever it needs to do that. That could have been separated so the class could be reused even if that stuff goes away. Which you only care about if violating OCP and directly rewriting the class when you need to do that is an issue. So is this thing published in a widely used library or is your development group the only ones that sees it directly? If not the only real cost of violating OCP is after you rewrite it you have to test the class all over again because you're losing your untouched and battle tested code to the rewrite.

Oh, and it turns out doing it this way lets you still be immutable. Neat, if you care about that.

Those are the kind of things you should think about before applying these principles. Always know why you care.

Also, if you really want to write code that looks like this,

accountInfo = new AccountInfo(accountId, extraInfo);
accountInfo.fetchProjects()
accountInfo.fetchUnits()

while imposing an order and allowing AccountInfo to be immutable then you can create an internal Domain Specific Language (iDSL) that leverages the ideas of Joshua Bloch's Builder. See Java's standard StringBuilder for an example. Josh's Builder gives you an immutable result but imposes no order on the methods called. The DSL can give you an immutable result while imposing an order on construction. The using code would look like this:

 AccountInfo accountInfo = new AccountInfo.Builder(accountId, extraInfo)
     .fetchProjects()
     .fetchUnits()
     .build()
 ;

You can't call those fetches out of order because in this DSL they're hanging off of different types and they return different types. Once AccountInfo exists it's perfectly immutable.

This is the other extreme. Setting up a DSL is a lot of work. Much more than tacking on another constructor. So much so that I wouldn't recommend it unless AccountInfo is something you see being constructed over and over in your code base. When that's the case though this work can save a lot of pain.

If you're looking at this and thinking, "this is tied to the data gathering implementation just as much as the original constructor", well you're right. If you want, you can separate the building order knowledge from the data gathering implementation knowledge.

public AccountInfo accountInfoFactoryMethod(AccountInfoBuilder accountInfoBuilderImp) {
    accountInfoBuilderImp
        .builder(accountId, extraInfo)
        .fetchProjects()
        .fetchUnits()
        .build()
    ;
}

You say that you don't want to move the projectsDAO.fetchProjects out of this constructor because then "... almost all of the logic would have to be done before the constructor is called..." but it seems to me that the logic outside this method is exactly what's determining which fetchProjects method to call.

What I mean is that you have logic outside of this class that determines whether any extraInfo is passed in, and since that is the only logic that determines which fetchProjects method to call, it should be made more directly responsible.

So I would go with your option 2 because the problems you cited aren't problems introduced by the option, but rather they exist regardless of the option chosen.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange