Readability versus maintainability, special case of writing nested function calls

https://softwareengineering.stackexchange.com/questions/366355

29-01-2021
|

Question

My coding style for nested function calls is the following:

var result_h1 = H1(b1);
var result_h2 = H2(b2);
var result_g1 = G1(result_h1, result_h2);
var result_g2 = G2(c1);
var a = F(result_g1, result_g2);

I have recently changed to a department where the following coding style is very much in use:

var a = F(G1(H1(b1), H2(b2)), G2(c1));

The result of my way of coding is that, in case of a crashing function, Visual Studio can open the corresponding dump and indicate the line where the problem occurs (I'm especially concerned about access violations).

I fear that, in case of a crash due to the same problem programmed in the first way, I won't be able to know which function has caused the crash.

On the other hand, the more processing you put on a line, the more logic you get on one page, which enhances readability.

Is my fear correct or am I missing something, and in general, which is preferred in a commercial environment? Readability or maintainability?

I don't know if it's relevant, but we are working in C++ (STL) / C#.

Solution

If you felt compelled to expand a one liner like

 a = F(G1(H1(b1), H2(b2)), G2(c1));

I wouldn't blame you. That's not only hard to read, it's hard to debug.

Why?

It's dense
Some debuggers will only highlight the whole thing at once
It's free of descriptive names

If you expand it with intermediate results you get

 var result_h1 = H1(b1);
 var result_h2 = H2(b2);
 var result_g1 = G1(result_h1, result_h2);
 var result_g2 = G2(c1);
 var a = F(result_g1, result_g2);

and it's still hard to read. Why? It solves two of the problems and introduces a fourth:

~~It's dense~~
~~Some debuggers will only highlight the whole thing at once~~
It's free of descriptive names
It's cluttered with non-descriptive names

If you expand it with names that add new, good, semantic meaning, even better! A good name helps me understand.

 var temperature = H1(b1);
 var humidity = H2(b2);
 var precipitation = G1(temperature, humidity);
 var dewPoint = G2(c1);
 var forecast = F(precipitation, dewPoint);

Now at least this tells a story. It fixes the problems and is clearly better than anything else offered here but it requires you to come up with the names.

If you do it with meaningless names like result_this and result_that because you simply can't think of good names then I'd really prefer you spare us the meaningless name clutter and expand it using some good old whitespace:

int a = 
    F(
        G1(
            H1(b1), 
            H2(b2)
        ), 
        G2(c1)
    )
;

It's just as readable, if not more so, than the one with the meaningless result names (not that these function names are that great).

~~It's dense~~
~~Some debuggers will only highlight the whole thing at once~~
It's free of descriptive names
~~It's cluttered with non-descriptive names~~

When you can't think of good names, that's as good as it gets.

For some reason debuggers love new lines so you should find that debugging this isn't difficult:

If that's not enough, imagine G2() was called in more than one place and then this happened:

Exception in thread "main" java.lang.NullPointerException
    at composition.Example.G2(Example.java:34)
    at composition.Example.main(Example.java:18)

I think it's nice that since each G2() call would be on it's own line, this style takes you directly to the offending call in main.

So please don't use problems 1 and 2 as an excuse to stick us with problem 4. Use good names when you can think of them. Avoid meaningless names when you can't.

Lightness Races in Orbit's comment correctly points out that these functions are artificial and have dead poor names themselves. So here's an example of applying this style to some code from the wild:

var user = db.t_ST_User.Where(_user => string.Compare(domain,  
_user.domainName.Trim(), StringComparison.OrdinalIgnoreCase) == 0)
.Where(_user => string.Compare(samAccountName, _user.samAccountName.Trim(), 
StringComparison.OrdinalIgnoreCase) == 0).Where(_user => _user.deleted == false)
.FirstOrDefault();

I hate looking at that stream of noise, even when word wrapping isn't needed. Here's how it looks under this style:

var user = db
    .t_ST_User
    .Where(
        _user => string.Compare(
            domain, 
            _user.domainName.Trim(), 
            StringComparison.OrdinalIgnoreCase
        ) == 0
    )
    .Where(
        _user => string.Compare(
            samAccountName, 
            _user.samAccountName.Trim(), 
            StringComparison.OrdinalIgnoreCase
        ) == 0
    )
    .Where(_user => _user.deleted == false)
    .FirstOrDefault()
;

As you can see, I've found this style works well with the functional code that's moving into the object oriented space. If you can come up with good names to do that in intermediate style then more power to you. Until then I'm using this. But in any case, please, find some way to avoid meaningless result names. They make my eyes hurt.

OTHER TIPS

On the other hand, the more processing you put on a line, the more logic you get on one page, which enhances readability.

I utterly disagree with this. Just looking at your two code examples calls this out as incorrect:

var a = F(G1(H1(b1), H2(b2)), G2(c1));

is heard to read. "Readability" does not mean information density; it means "easy to read, understand and maintain".

Sometimes, the code is simple and it makes sense to use a single line. Other times, doing so just makes it harder to read, for no obvious benefit beyond cramming more onto one line.

However, I'd also call you out on claiming that "easy to diagnose crashes" means code is easy to maintain. Code that doesn't crash is far easier to maintain. "Easy to maintain" is achieved primarily via the code been easy to read and understand, backed up with a good set of automated tests.

So if you are turning a single expression into a multi-line one with many variables just because your code often crashes and you need better debug information, then stop doing that and make the code more robust instead. You should prefer writing code that doesn't need debugging over code that's easy to debug.

Your first example, the single-assignment-form, is unreadable because the chosen names are utterly meaningless. That might be an artifact of trying not to disclose internal information on your part, the true code might be fine in that respect, we cannot say. Anyway, it's long-winded due to extremely low information-density, which does not generally lend itself to easy understanding.

Your second example is condensed to an absurd degree. If the functions had useful names, that might be fine and well readable because there isn't too much of it, but as-is it's confusing in the other direction.

After introducing meaningful names, you might look whether one of the forms seems natural, or if there's a golden middle to shoot for.

Now that you have readable code, most bugs will be obvious, and the others at least have a harder time hiding from you.

As always, when it comes to readability, failure is in the extremes. You can take any good programming advice, turn it into a religious rule, and use it to produce utterly unreadable code. (If you don't believe me on this, check out these two IOCCC winners borsanyi and goren and take a look at how differently they use functions to render the code utterly unreadable. Hint: Borsanyi uses exactly one function, goren much, much more...)

In your case, the two extremes are 1) using single expression statements only, and 2) joining everything into big, terse, and complex statements. Either approach taken to the extreme renders your code unreadable.

Your task, as a programmer, is to strike a balance. For every statement you write, it is your task to answer the question: "Is this statement easy to grasp, and does it serve to make my function readable?"

The point is, there is no single measurable statement complexity that can decide, what is good to be included in a single statement. Take for example the line:

double d = sqrt(square(x1 - x0) + square(y1 - y0));

This is quite a complex statement, but any programmer worth their salt should be able to immediately grasp what this does. It is a quite well known pattern. As such, it is much more readable than the equivalent

double dx = x1 - x0;
double dy = y1 - y0;
double dxSquare = square(dx);
double dySquare = square(dy);
double dSquare = dxSquare + dySquare;
double d = sqrt(dSquare);

which breaks the well known pattern into a seemingly meaningless number of simple steps. However, the statement from your question

var a = F(G1(H1(b1), H2(b2)), G2(c1));

seems overly complicated to me, even though it's one operation less than distance calculation. Of course, that's a direct consequence of me not knowing anything about F(), G1(), G2(), H1(), or H2(). I might decide differently if I knew more about them. But that's precisely the problem: The advisable complexity of a statement strongly depends on the context, and on the operations involved. And you, as a programmer, are the one who must take a look at this context, and decide what to include into a single statement. If you care about readability, you cannot offload this responsibility to some static rule.

@Dominique, I think in your question's analysis, you're making the mistake that "readability" and "maintainability" are two separate things.

Is it possible to have code that is maintainable but unreadable? Conversely, if code is extremely readable, why would it become unmaintainable on account of being readable? I've never heard of any programmer who played these factors off against each other, having to choose one or the other!

In terms of deciding whether to use intermediate variables for nested function calls, in the case of 3 variables given, calls to 5 separate functions, and some calls nested 3 deep, I would tend towards using at least some intermediate variables to break that down, as you have done.

But I certainly do not go as far as to say function calls must never be nested at all. It's a question of judgment in the circumstances.

I would say the following points bear on the judgment:

If the called functions represent standard mathematical operations, they're more capable of being nested than functions which represent some obscure domain logic whose results are unpredictable and cannot necessarily be evaluated mentally by the reader.
A function with a single parameter is more capable of participating in a nest (either as an inner or outer function) than a function with multiple parameters. Mixing functions of different arities at different nesting levels is prone to leave code looking like a pig's ear.
A nest of functions which programmers are accustomed to seeing expressed in a particular way - perhaps because it represents a standard mathematical technique or equation, which has a standard implementation - may be more difficult to read and verify if it is broken down into intermediate variables.
A small nest of function calls that performs simple functionality and is already clear to read, and is then broken down excessively and atomised, is capable of being more difficult to read than one that was not broken down at all.

Both are suboptimal. Consider Comments.

// Calculating torque according to Newton/Dominique, 4th ed. pg 235
var a = F(G1(H1(b1), H2(b2)), G2(c1));

Or specific functions rather than general ones:

var a = Torque_NewtonDominique(b1,b2,c1);

When deciding which results to spell out, keep in mind cost (copy vs reference, l-value vs r-value), readability, and risk, individually for each statement.

For example, there's no added value from moving simple unit/type conversions onto their own lines, because these are easy to read and extremely unlikely to fail:

var radians = ExtractAngle(c1.Normalize())
var a = Torque(b1.ToNewton(),b2.ToMeters(),radians);

Regarding your worry of analyzing crash dumps, input validation is usually far more important - the actual crash is quite likely to happen inside these functions rather than the line calling them, and even if not, you don't usually need to be told exactly where things blew up. It's way more important to know where things started to fall apart, than it is to know where they finally blew up, which is what input validation catches.

Readability is the major portion of maintainability. Doubt me? Pick a large project in a language you don't know (perferably both the programming language and the language of the programmers), and see how you'd go about refactoring it...

I would put readability as somewhere between 80 and 90 of maintainability. The other 10-20 percent is how amenable it is to refactoring.

That said, you effectively pass in 2 variables to your final function (F). Those 2 variables are created using 3 other variables. You'd have been better off passing b1, b2 and c1 into F, if F already exist, then create D that does the composition for F and returns the result. At that point it's just a matter of giving D a good name, and it won't matter which style you use.

On a related not, you say that more logic on the page aids readability. That is incorrect, the metric isn't the page, it is the method, and the LESS logic a method contains, the more readable it is.

Readable means that the programmer can hold the logic (input, output, and algorithm) in their head. The more it does, the LESS a programmer can understand it. Read up on cyclomatic complexity.

Regardless if you are in C# or C++, as long as you are in a debug build, a possible solution is wrapping the functions

var a = F(G1(H1(b1), H2(b2)), G2(c1));

You can write oneline expression, and still get pointed where the problem is simply by looking at the stack trace.

returnType F( params)
{
    returnType RealF( params);
}

Of course, if case you call the same function multiple times in the same line you cannot know which function, however you can still identify it:

Looking at function parameters
If parameters are identical and the function does not have side effects then two identical calls become 2 identical calls etc.

This is not a silver bullet, but a not so bad half-way.

Not to mention that wrapping group of functions can even be more beneficial for readability of the code:

type CallingGBecauseFTheorem( T b1, C b2)
{
     return G1( H1( b1), H2( b2));
}

var a = F( CallingGBecauseFTheorem( b1,b2), G2( c1));

In my opinion, self-documenting code is better for both maintainability and readability, regardless of language.

The statement given above is dense, but "self documenting":

double d = sqrt(square(x1 - x0) + square(y1 - y0));

When broken into stages (easier for testing, surely) loses all context as stated above:

double dx = x1 - x0;
double dy = y1 - y0;
double dxSquare = square(dx);
double dySquare = square(dy);
double dSquare = dxSquare + dySquare;
double d = sqrt(dSquare);

And obviously using variable and function names that state their purpose clearly is invaluable.

Even "if" blocks can be good or bad in self-documenting. This is bad because you cannot easily force the first 2 conditions to test the third ... all are unrelated:

if (Bill is the boss) && (i == 3) && (the carnival is next weekend)

This one makes more "collective" sense and is easier to create test conditions:

if (iRowCount == 2) || (iRowCount == 50) || (iRowCount > 100)

And this statement is just a random string of characters, viewed from a self-documenting perspective:

var a = F(G1(H1(b1), H2(b2)), G2(c1));

Looking at the above statement, maintainability is still a major challenge if functions H1 and H2 both alter the same "system state variables" instead of being unified into a single "H" function, because someone eventually will alter H1 without even thinking there is an H2 function to look at and could break H2.

I believe good code design is very challenging because there re no hard-and-fast rules that can be systematically detected and enforced.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange