What makes Perl code maintainable?

https://stackoverflow.com/questions/4349142

30-09-2019
|

Question

I've been writing Perl for several years now and it is my preferred language for text processing (many of the genetics/genomics problems I work on are easily reduced to text processing problems). Perl as a language can be very forgiving, and it's possible to write very poor, but functional, code in Perl. Just the other day, my friend said he calls Perl a write-only language: write it once, understand it once, and never ever try to go back and fix it after it's finished.

While I have definitely been guilty of writing bad scripts at times, I feel like I have also written some very clear and maintainable code in Perl. However, if someone asked me what makes the code clear and maintainable, I wouldn't be able to give a confident answer.

What makes Perl code maintainable? Or maybe a better question is what makes Perl code hard to maintain? Let's assume I'm not the only one that will be maintaining the code, and that the other contributors, like me, are not professional Perl programmers but scientists with programming experience.

Solution

What makes Perl code unmaintainable? Pretty much anything that makes any other program unmaintainable. Assuming anything other than a short script intended to carry out a well defined task, these are:

Global variables
Lack of separation of concerns: Monolithic scripts
NOT using self-documenting identifiers (variable names and method names). E.g. you should know what a variable's purpose is from its name. $c bad. $count better. $token_count good.
- Spell identifiers out. Program size is no longer of paramount concern.
- A subroutine or method called doWork doesn't say anything
- Make it easy to find the source of symbols from another package. Either use explicit package prefix, or explicitly import every symbol used via use MyModule qw(list of imports).
Perl-specific:
- Over-reliance on short-cuts and obscure builtin variables
- Abuse of subroutine prototypes
- not using strict and not using warnings
Reinventing the wheel rather than using established libraries
Not using a consistent indentation style
Not using horizontal and vertical white space to guide the reader

etc etc etc.

Basically, if you think Perl is -f>@+?*<.-&'_:$#/%!, and you aspire to write stuff like that in production code, then, yeah, you'll have problems.

People tend to confuse stuff Perl programmers do for fun (e.g., JAPHs, golf etc) with what good Perl programs are supposed to look like.

I am still unclear on how they are able to separate in their minds code written for IOCCC from maintainable C.

OTHER TIPS

I suggest:

Don't get too clever with the Perl. If you start playing golf with the code, it's going to result in harder-to-read code. The code you write needs to be readable and clear more than it needs to be clever.
Document the code. If it's a module, add POD describing typical usage and methods. If it's a program, add POD to describe command line options and typical usage. If there's a hairy algorithm, document it and provide references (URLs) if possible.
Use the /.../x form of regular expressions, and document them. Not everyone understands regexes well.
Know what coupling is, and the pros/cons of high/low coupling.
Know what cohesion is, and the pros/cons of high/low cohesion.
Use modules appropriately. A nice well-defined, well-contained concept makes a great module. Reuse of such modules is the goal. Don't use modules simply to reduce the size of a monolithic program.
Write unit tests for you code. A good test suite will not only allow you to prove your code is working today, but tomorrow as well. It will also let you make bolder changes in the future, with confidence that you are not breaking older applications. If you do break things, then, well, your tests suite wasn't broad enough.

But overall, the fact that you care enough about maintainability to ask a question about it, tells me that you're already in a good place and thinking the right way.

I don't use all of Perl Best Practices, but that's the thing that Damian wrote it for. Whether or not I use all the suggestions, they are all worth at least considering.

What makes Perl code maintainable?

At the least:

use strict;
use warnings;

See perldoc perlstyle for some general guidelines that will make your programs easier to read, understand, and maintain.

One factor very important to code readability that I haven't seen mentioned in other answers is the importance of white space, which is both Perl-agnostic and in some ways Perl-specific.

Perl lets you write VERY concise code, but consise chunks don't mean they have to be all bunched together.

White space has lots of meaning/uses when we are talking about readability, not all of them widely used but most useful:

Spaces around tokens to easier separate them visually.

This space is doubly important in Perl due to prevalence of line noise characters even in best-style Perl code.

I find $myHashRef->{$keys1[$i]}{$keys3{$k}} to be less readable at 2am in the middle of producion emergency compared to spaced out: $myHashRef->{ $keys1[$i] }->{ $keys3{$k} }.

As a side note, if you find your code doing a lot of deep nested reference expressions all starting with the same root, you should absolutely consider assigning that root into a temporary pointer (see Sinan's comment/answer).

A partial but VERY important special case of this is of course regular expressions. The difference was illustrated to death in all the main materials I recall (PBP, RegEx O'Reilly book, etc..) so I won't lengthen this post even further unless someone requests examples in the comments.
Correct and uniform indentation. D'oh. Obviously. Yet I see way too much code 100% unreadable due to crappy indentation, and even less readable when half of the code was indented with TABs by a person whose editor used 4 character tabs and another by a person whose editor used 8 character TABs. Just set your bloody editor to do soft (e.g. space-emulated) TABs and don't make others miserable.
Empty lines around logically separate units of code (both blocks and just sets of lines). You can write a 10000 line Java program in 1000 lines of good Perl. Now don't feel like Benedict Arnold if you add 100-200 empty lines to those 1000 to make things more readable.
Splitting uber-long expressions into multiple lines, closely followed by...

Correct vertical alignment. Witness the difference between:

if ($some_variable > 11 && ($some_other_bigexpression < $another_variable || $my_flag eq "Y") && $this_is_too_bloody_wide == 1 && $ace > my_func() && $another_answer == 42 && $pi == 3) {

and

if ($some_variable > 11 && ($some_other_bigexpression < $another_variable || 
    $my_flag eq "Y") && $this_is_too_bloody_wide == 1 && $ace > my_func()
    && $another_answer == 42 && $pi == 3) {

and

if (   $some_variable > 11
    && ($some_other_bigexpression < $another_variable || $my_flag eq "Y")
    && $this_is_too_bloody_wide == 1
    && $ace > my_func()
    && $another_answer == 42
    && $pi == 3) {

Personally, I prefer to fix the vertical alignment one more step by aligning LHS and RHS (this is especially readable in case of long SQL queries but also in Perl code itself, both the long conditionals like this one as well as many lines of assignments and hash/array initializations):

if (   $some_variable               >  11
    && ($some_other_bigexpression   <  $another_variable || $my_flag eq "Y")
    && $this_is_too_bloody_wide    ==  1
    && $ace                         >  my_func()
    && $another_answer             ==  42
    && $pi                         ==  3  ) {

As a side note, in some cases the code could be made even more readable/maintainable by not having such long expressions in the first place. E.g. if the contents of the if(){} block is a return, then doing multiple if/unless statements each of which has a return block may be better.

i see this as an issue of people being told that perl is unreadable, and they start to make assumptions about the maintability of their own code. if you are conscientious enough to consider readability as a hallmark of quality code, chances are this critique doesn't apply to you.

most people will cite regexes when they discuss readability. regexes are a dsl embedded in perl and you can either read them or not. if someone can't take the time to understand something so basic and essential to many languages, i'm not concerned about trying to bridge some inferred cognitive gap...they should just man up, read the perldocs, and ask questions where necessary.

others will cite perl's use of short-form vars such as @_, $! etc. these are all easily disambiguated...i'm not interested in making perl look like java.

the upside of all of these quirks and perlisms is that codebases written in the language are often terse and compact. i'd rather read ten lines of perl than one hundred lines of java.

to me there is so much more to "maintainability" than simply having easy-to-read code. write tests, make assertions...do everything else you can do to lean on perl and its ecosystem to keep code correct.

in short: write programs to be first correct, then secure, then well-performing....once these goals have been met, then worry about making it nice to curl up with near a fire.

I would say the packaging/object models, that gets reflected in the directory structure for .pm files. For my PhD I wrote quite a lot of Perl code that I reuse afterwards. It was for automatic LaTeX diagram generator.

I'll talk some positive things to make Perl maintainable.

It's true that you usually shouldn't get too clever with really dense statements a la return !$@;#% and the like, but a good amount of clever using list-processing operators, like map and grep and list-context returns from the likes of split and similar operators, in order to write code in a functional style can make a positive contribution to maintainability. At my last employer we also had some snazzy hash-manipulation functions that worked in a similar way (hashmap and hashgrep, though technically we only fed them even-sized lists). For instance:

# Look for all the servers, and return them in a pipe-separated string
# (because we want this for some lame reason or another)
return join '|', 
       sort
       hashmap {$a =~ /^server_/ ? $b : +()} 
       %configuration_hash;

See also Higher Order Perl, http://hop.perl.plover.com - good use of metaprogramming can make defining tasks more coherent and readable, if you can keep the metaprogramming itself from getting in the way.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow