Writing a good/readable constructor that needs a lot of calculation to fill its fields

https://softwareengineering.stackexchange.com/questions/325129

21-12-2020
|

Question

I have a class that has several fields which can only be filled consecutively by a lot of calculation. The 1st field can be set very easily. To fill the 2nd field, we take the 1st field's content and calcuate a lot. The 3rd, 4th and 5th field must be calculated together from field1 and field2 (it is not possible to calculate 3, 4 and 5 isolated from each other without severe performace loss). And the 6th field is calculated from field 1, 3 and 4.

I will use C# syntax for every code example.

class Foo
{
    private Type1 f1; // obtained directly from input parameter of constructor
    private Type2 f2; // from f1 we can calculate f2
    private Type3 f3; // from f1 and f2 we can calculate the triple (f3,f4,f5)
    private Type4 f4; // from f1 and f2 we can calculate the triple (f3,f4,f5)
    private Type5 f5; // from f1 and f2 we can calculate the triple (f3,f4,f5)
    private Type6 f6; // from f1, f3 and f4 we can calculate f6

    public Foo(Type1 input)
    {
        ... // What is a good way to write it?
    }
}

I would like to get some input regarding a good way to implement such a constructor. I am quite new to real-life software engineering (I just made some sandbox projects in university earlier) and don't know yet what the most important aspects are.

Here are the two ways I have in mind. Please comment on them, tell me about pros and cons I have not thought of, or give examples of how you would write such a constructor!

First version: static side-effect-free methods

public Foo(Type1 input)
{
    f1 = input;
    f2 = GetF2(f1);
    Tuple<Type3, Type4, Type5> bar = GetF3F4F5(f1, f2);
    f3 = bar.Item1;
    f4 = bar.Item2;
    f5 = bar.Item3;
    f6 = GetF6(f1, f3, f4);
}

private static Type2 GetF2(Type1 field1)
{
    Type2 field2 = new Type2();

    ... // calculating a lot, setting the correct value for field2

    return field2;
}

private static Tuple<Type3, Type4, Type5> GetF3F4F5(Type1 field1, Type2 field2)
{
    Type3 field3 = new Type3();
    Type4 field4 = new Type4();
    Type5 field5 = new Type5();

    ... // calculating a lot, setting the correct values for field3, 4 and 5

    return new Tuple<Type3, Type4, Type5>(field3, field4, field5);
}

private static Type6(Type1 field1, Type3 field3, Type4 field4)
{
    Type6 field6 = new Type6();

    ... // calculating a lot, setting the correct value for field6

    return field6;
}

pro:

At every line in the constructor, one knows exactly which fields are already set and which are not, making it clearly readable.
The calculations are encapsuled in side effect free methods, making them easily changeable.
Every method clearly shows what fields are needed for the calculation.

con:

Creating tuples has some overhead.
It might seem useless for the reader that a variable is defined, initialized and returned just to set a field to this value - instead of setting the field directly.

Second version: non-static methods manipulating the fields directly

public Foo(Type1 input)
{
    f1 = input;
    GetF2();
    GetF3F4F5();
    GetF6();
}

private void GetF2()
{
    f2 = new Type2();

    ... // calculating a lot with f1, setting the correct value for f2
}

private void GetF3F4F5()
{
    f3 = new Type3();
    f4 = new Type4();
    f5 = new Type5();

    ... // calculating a lot with f1 and f2, setting the correct values for f3, f4 and f5
}

private void Type6()
{
    f6 = new Type6();

    ... // calculating a lot with f1, f3, f4, setting the correct value for f6
}

pro:

Slightly shorter.
No Tuple overhead.
Probably a bit faster because the detour "initialize->return->set" is missing . (just a guess)

con:

If you take some line in the constructor, you cannot say which fields are already initialized and which aren't. (Or maybe you can because of the method names, but for my eyes it is not as clear as in the first version.)
The methods do not tell which fields they use for calculation.

Solution

The first approach is much, much better than the second approach.

The most important quality of software is readability, since the number of times we may write and re-write a line of code is much smaller than the number of times we may read a line of code. (Do you ever write code and then never read it again? I hope not. And before you re-write a line of code, you will certainly read it at least once, right?)

The performance penalty associated with returning multiple values from a method call is completely immaterial, especially since by your own admission, the calculations involved are extensive. So, returning the results is not only very cheap, but also, much cheaper than other operations that you already know that you have to do.

Also, since the calculation methods are going to be private, the compiler may even optimize them away, so there may even be no performance penalty whatsoever.

(In any case, if you really want to avoid the creation of the tuple, and since you are using C# syntax, you might as well use out parameters.)

(Also, keep in mind that the second approach would not even be an option if you were hoping to make your object immutable.)

There are two kinds of performance concerns: algorithmic concerns, and "save the clock cycles" type of concerns.

Algorithmic performance concerns have to do with using quicksort instead of bubble sort, using binary search instead of linear search, desynchronizing operations so that they can be performed in parallel instead of sequentially, improving protocols of communication so as to not require round-trips, etc.

"Save the clock cycles" type of performance concerns are about hacking, tweaking, and twisting stuff in order to save clock cycles here and there.

My advice to you would be to never consider a "save the clock cycles" type of performance concern as a "pro" or a "con" when deciding how to structure software. This type of concern will never make a difference, and in the extremely unlikely event that it may one day be found to make a difference, there will always be a very easy way to fix it.

If you really really want a suggestion for an alternative approach, consider the factory method pattern:

Make your constructor private, and have it accept exactly one parameter for each member variable to be initialized. (f1, f2, f3, f4, f5, and f6.)
Write a public static function which accepts the Type1 input parameter, performs all the calculations necessary storing the results in local variables named f1, f2, f3, f4, f5, and f6, and ends with return new Foo( f1, f2, f3, f4, f5, f6);.

OTHER TIPS

Consider separating use from construction.

Foo foo = new FooInjector(f1).build();

Now making Foo immutable is trivial. Now I can look at Foo's behavior without being distracted by f# construction. Sure Foo has a long constructor to handle but that's hidden in FooInjector, not spread throughout the code base.

If you find many classes (besides just Foo) need to do this to access the f# family you might want to look into the introduce parameter object refactoring.

I'd also go into why a constructor shouldn't be doing real work but that already has an answer here.

I don't see that it matters. All the processing is internal / private to the foo class. It's one massive side-effect anyway; the constructor is setting Foo object state. If Foo cannot trust itself then what's to do? Don't hold state and re-calculate all the fx objects every time?

Nonetheless I like the 1st option not because of side-effect management but because explicitly passing parameters expresses the dependencies for constructing an object. It makes the required construction order and participating objects unambiguous. And it does not make me have to read lots of code, as the 2nd example does, to figure out what's going on.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange