Question

I've been working on some Perl libraries for data mining. The libraries are full of nested loops for gathering and processing information. I'm working with strict mode and I always declare my variables with my outside of the first loop. For instance:

# Pretty useless code for clarity purposes:

my $flag = 1;
my ($v1, $v2);

while ($flag) {
  for $v1 (1 .. 1000) {

    # Lots and lots of code...

    $v2 = $v1 * 2;
  }
}

For what I've read here, performance-wise, it is better to declare them outside of the loop, however, the maintenance of my code is becoming increasingly difficult because the declaration of some variables ends up pretty far away from where they are actually used.

Something like this would be easier to mantain:

my $flag = 1;

while ($flag) {
  for my $v1 (1 .. 1000) {

    # Lots and lots of code...

    my $v2 = $v1 * 2;
  }
}

I don't have much of experience with Perl since I come from working mostly with C++. At some point, I would like to open source most of my libraries, so I would like them to be as pleasing for all of the Perl gurus as possible.

From a professional Perl developer point of view, what is most appropriate choice between these options?

Was it helpful?

Solution

The general rule is to declare every variable as late as possible.

If the value of a variable doesn't need to be kept across iterations of a loop then declare it inside the loop, or as the loop control variable for a for loop.

If it needs to remain static across the loop iterations (like your $flag) then declare it immediately before the loop.

Yes, there is a minimal speed cost to be paid if you discard and reallocate a variable every time a block is executed, but programming and maintenance costs are by far the most important efficiency and should always be put first.

You shouldn't be optimising your code before it has been made to work and found to be running too slowly; and even then, moving declarations to the top of the file is a long way down the list of compromises that are likely to make a useful difference.

OTHER TIPS

Optimize for readability. This means declaring variables in the smallest possible scope. Ideally, I can see the variable declaration and all usages of that variable at the same time. We can only keep a very limited amount of context in our heads, so declaring variables near their use makes it easier to understand, write, and debug code.

Understanding what variant performs better is difficult to estimate, and difficult to measure as the effect will be rather small. But if performance is roughly equivalent, we might as well use the more readable variant.

I personally often try to write code in a single assignment form where variables aren't reassigned, and mutators like push @array, $elem are avoided. This makes sure that the name of a variable and its value are always interchangeable which makes it easier to reason about code. This implies that each variable declaration is also an initialization, which removes a whole class of errors.

You should declare variables when you're ready to define them unless you need to access the answer in a larger scope. Even then passing the value back explicitly will be easier to follow.

The particular example you have given (declaration of a loop variable) probably does not have a performance penalty. As the link you quoted says the reason for a performance difference boils down to whether the variable is initialised inside the loop. In the case of a for loop it will be initialised either way.

I almost always declare the variables in the innermost scope possible. It reduces the chances of making mistakes. I would only alter that if performance became a problem in a specific loop.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top