Does “variables should live in the smallest scope as possible” include the case “variables should not exist if possible”?

https://softwareengineering.stackexchange.com/questions/388435

22-02-2021
|

Question

According to the accepted answer on "Rationale to prefer local variables over instance variables?", variables should live in the smallest scope possible.
Simplify the problem into my interpretation, it means we should refactor this kind of code:

public class Main {
    private A a;
    private B b;

    public ABResult getResult() {
        getA();
        getB();
        return ABFactory.mix(a, b);
    }

    private getA() {
      a = SomeFactory.getA();
    }

    private getB() {
      b = SomeFactory.getB();
    }
}

into something like this:

public class Main {
    public ABResult getResult() {
        A a = getA();
        B b = getB();
        return ABFactory.mix(a, b);
    }

    private getA() {
      return SomeFactory.getA();
    }

    private getB() {
      return SomeFactory.getB();
    }
}

but according to the "spirit" of "variables should live in the smallest scope as possible", isn't "never have variables" have smaller scope than "have variables"? So I think the version above should be refactored:

public class Main {
    public ABResult getResult() {
        return ABFactory.mix(getA(), getB());
    }

    private getA() {
      return SomeFactory.getA();
    }

    private getB() {
      return SomeFactory.getB();
    }
}

so that getResult() doesn't have any local variables at all. Is that true?

Solution

No. There are several reasons why:

Variables with meaningful names can make code easier to comprehend.
Breaking up complex formulas into smaller steps can make the code easier to read.
Caching.
Holding references to objects so that they can be used more than once.

And so on.

OTHER TIPS

Agree, variables which are not necessary and does not improve the readability of the code should be avoided. The more variables which are in scope at a given point in code, the more complex that code is to understand.

I don't really see the benefit of the variables a and b in your example, so I would write the version without variables. On the other hand the function is so simple in the first place I don't think it matters much.

It becomes more of an issue the longer the function get and the more variables are in scope.

For example if you have

    a=getA();
    b=getB();
    m = ABFactory.mix(a,b);

At the top of a larger function, you increase the mental burden of understanding the rest of the code by introducing three variables rather than one. You have have to read through the rest of the code to see if a or b are used again. Locals which are in scope longer than they need to are bad for the overall readability.

Of course in the case where a variable is necessary (e.g to store a temporary result) or where a variable does improve the readability of the code, then it should be kept.

In addition to the other answers, I'd like to point something else out. The benefit to keeping a variable's scope small isn't just reducing how much code syntactically has access to the variable, but also reducing the number of possible control-flow paths that can potentially modify a variable (either by assigning it a new value or calling a mutating method on the existing object held in the variable).

Class-scoped variables (instance or static) have significantly more possible control-flow paths than locally-scoped variables because they can be mutated by methods, which can be called in any order, any number of times, and often by code outside the class.

Let's take a look at your initial getResult method:

public ABResult getResult() {
    getA();
    getB();
    return ABFactory.mix(this.a, this.b);
}

Now, the names getA and getB may suggest that they will assign to this.a and this.b, we can't know for sure from just looking at getResult. Thus, it's possible that the this.a and this.b values passed into the mix method instead come from the state of the this object from before getResult was invoked, which is impossible to predict since clients control how and when methods are called.

In the revised code with the local a and b variables, it is clear that there is exactly one (exception-free) control-flow from the assignment of each variable to its use, because the variables are declared right before they are used.

Thus, there is a significant benefit to moving (modifyable) variables from class-scope to local-scope (as well as moving (modifyable) variables from the outside of a loop to the inside) in that it simplifies control-flow reasoning.

On the other hand, eliminating variables like in your last example has less of a benefit, because it doesn't really affect control-flow reasoning. You also lose the names given to the values, which doesn't happen when simply moving a variable to an inner scope. This is a trade-off you have to consider, so eliminiating variables might be better in some cases, and worse in others.

If you don't want to lose the variable names, but still want to reduce the scope of the variables (in the case of them being used inside a larger function), you can consider wrapping the variables and their uses in a block statement (or moving them to their own function).

This is somewhat language-dependent, but I would say that one of the less obvious benefits of functional programming is that it encourages the programmer and reader of code to not need these. Consider:

(reduce (fn [map string] (assoc map string (inc (map string 0)))

Or some LINQ:

var query2 = mydb.MyEntity.Select(x => x.SomeProp).AsEnumerable().Where(x => x == "Prop");

Or Node.js:

 return visionFetchLabels(storageUri)
    .then(any(isCat))
    .then(notifySlack(secrets.slackWebhook, `A cat was posted: ${storageUri}`))
    .then(logSuccess, logError)
    .then(callback)

The last is a chain of calling a function on the result of a previous function, without any intermediate variables. Introducing them would make it much less clear.

However, the difference between the first example and the other two is the implied order of operation. This may not be the same as the order it's actually computed, but it's the order in which the reader should think about it. For the second two this is left to right. For the Lisp/Clojure example it's more like right to left. You should be a little wary of writing code that's not in the "default direction" for your language, and "middle out" expressions that mix the two should definitely be avoided.

F#'s pipe operator |> is useful partly because it allows you to write left-to-right things that would otherwise have to be right-to-left.

I would say no, because you should read "smallest scope possible" as "among existing scopes or ones that are reasonable to add". Otherwise it would imply that you should create artificial scopes (e.g. gratuitous {} blocks in C-like languages) just to ensure that a variable's scope does not extend beyond the last intended use, and that would generally be frowned upon as obfuscation/clutter unless there's already a good reason for the scope to exist independently.

Consider functions (methods). There it is neither splitting the code in the smallest possible subtask, neither the largest singleton piece of code.

It is a shifting limit, with delimiting logical tasks, into consumable pieces.

The same holds for variables. Pointing out logical data structures, into understandable parts. Or also simply naming (stating) the parameters:

boolean automatic = true;
importFile(file, automatic);

But of course having a declaration at the top, and two hundred lines further the first usage is nowadays accepted as bad style. This is clearly what "variables should live in the smallest scope possible" intends to say. Like a very near "do not reuse variables."

Whats somewhat missing as reason for NO is debugging / readabillity. Code should be optimized for that, and clear and concise names help a lot e.g. imagine a 3 way if

if (frobnicate(x) && (get_age(x) > 2000 || calculate_duration(x) < 100 )

this line is short, but already hard to read. Add a few more parameters, and that if spans multiple lines.

can_be_frobnicated = frobnicate(x)
is_short_lived_or_ancient = get_age(x) > 2000 || calculate_duration(x) < 100
if (can_be_frobnicated || is_short_lived_or_ancient )

I find this way easier to read and to communicate meaning - so I have no problem with intermediate variables.

Another example would be languages like R, where the last line is automatically the return value:

some_func <- function(x) {
    compute(x)
}

this is dangerous, is the return expexted or needed? this is clearer:

some_func <- function(x) {
   rv <- compute(x)
   return(rv)
}

As always, this is a judgement call - eliminate intermediary variables if they do not improve reading, otherwise keep or introduce them.

Another point can be debuggability: If the intermediary results are of interest, it can be best to simply introduce an intermediary, like in the R example above. How often this is called for is difficult to imagine and be careful what you check in - too much debugging variables are confusing - again, a judgement call.

Referring to just your title: absolutely, if a variable is unnecessary it should be deleted.

But “unnecessary” doesn’t mean that an equivalent program can be written without using the variable, otherwise we’d be told that we should write everything in binary.

The most common kind of unnecessary variable is an unused variable, the smaller the scope of the variable the easier it is to determine that it is unnecessary. Whether an intermediate variable is unnecessary is harder to determine, because it’s not a binary situation, it’s contextual. In fact identical source code in two different methods could produce a different answer by the same user depending upon past experience fixing problems in the surrounding code.

If your example code was exactly as represented, I would suggest getting rid of the two private methods, but would have little to no concern about whether you saved the result of the factory calls to a local variable or just used them as arguments to the mix method.

Code readability trumps everything except working correctly (correctly includes acceptable performance criteria, which is seldom “as fast as possible”).

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange