Code reviewing nitpicks for readability [actually for DRY]

https://softwareengineering.stackexchange.com/questions/398897

02-03-2021
|

Question

I was doing a code review today and noticed that the code would work but could have easily been made more readable.

The example in particular was roughly as follows:

def group(x):
    GROUP_ALPHA = ('a', 'b', 'e', 'f', 'i', 'j', 'k', 'l')
    GROUP_BETA = ('m', 'n', 'o', 'p', 'q')
    GROUP_CHARLIE = ('x', 'y', 'z', 'aa', 'ab', 'ae', 'af')
    GROUP_DELTA = ('ag', 'ah', 'ai', 'aj')
    GROUP_ECHO = ('ak', 'al', 'am', 'an', 'ao', 'ar', 'as')

    if x in GROUP_ALPHA:
        return 'ALPHA'
    elif x in GROUP_BETA:
        return 'BETA'
    elif x in GROUP_CHARLIE:
        return 'CHARLIE'
    elif x in GROUP_DELTA:
        return 'DELTA'
    elif x in GROUP_ECHO:
        return 'ECHO'

Note that every group is disjoint.

To me, this would be more readable with:

def group(x):
    GROUPS = {
        'ALPHA': ('a', 'b', 'e', 'f', 'i', 'j', 'k', 'l'),
        'BETA': ('m', 'n', 'o', 'p', 'q'),
        'CHARLIE': ('x', 'y', 'z', 'aa', 'ab', 'ae', 'af'),
        'DELTA': ('ag', 'ah', 'ai', 'aj'),
        'ECHO': ('ak', 'al', 'am', 'an', 'ao', 'ar', 'as')
    }
    for group_name, group_elements in GROUPS:
        if x in group_elements:
            return group_name

Should I mention this? The original code works and is used in a non-critical capacity. When I was researching, I noticed this answer, which remarks:

Every change costs money. Rewriting code that works and is unlikely to need to be changed could be wasted effort. Focus your attention on the sections that are more subject to change or that are most important to the project.

This would suggest that the answer is no, but it seems wasteful to have a code review and not suggest a better way to do things, even if the original way "works". Personally I would want someone to make such a suggestion to me.

Note: the above examples have been edited since the question was first written. As well, each entry ('a', 'b', etc.) is actually a human readable str related to business logic.

Solution

The given example does IMHO not make a big difference in readability - in fact, it is quite debatable if the second variant is really more readable than the first one. However, it is clearly an improvement in following the DRY principle, and that is a very valid reason to change the code.

Let us assume the intention is to deal with all these groups always in a similar fashion, which I guess is quite likely. Let us also assume there are at least five of these groups. Now you get the new requirement to change the logic to something like

if lower(x) in GROUP_ALPHA:

then in the first example, you will have to change 5 places, make sure you change them identically and don't forget any place. In the second example, there will be only one place to change, which is definitely an improvement. Further, if you are going to write a unit test for this function, or specifically for the change to this function, the first variant will require 5 different test cases for achieving full coverage, and the second only one.

Even simpler - assume you just want to add another group - in the first example, you will have to change two places in code (which includes copying the whole "search" section), in the second example, you will only have to extend the dictionary.

Keeping code DRY is not just a matter of "taste" - it is a matter of bringing code into a state where it is less likely to cause future headaches. And code reviews are a good opportunity for this.

OTHER TIPS

One benefit of code reviews is indeed to suggest better ways to do things. Better, of course, means different things in different contexts, but can include code that is more readable, is faster, consumes less memory, follows good design principles, is less complex, etc. Suggesting a change that is more readable is a perfect example of something to note during a code review.

Every change costs money, but so does trying to decipher badly-written code 6 months from now.

Keep your target audience in mind, however, and realize that it may be more readable to you but less readable to others (especially if you work with a lot of junior engineers). Also keep your application environment / needs in mind when you suggest these changes. While your solution may be faster, faster code may not be necessary. If the dictionary is only ever going to have 2 keys in it that map to a list of 3 elements, performance will never be a problem, regardless of how you set it up. It's probably still worth bringing up, as the rest of your team can use it the next time, but realize that it is not something to dig in your heels on.

The proposed change introduces a subtle bug, so I guess this is a good example of why suggestions should be vetted carefully.

If x is member of multiple groups, the first example will return the first group found in the explicit order of the if's. In the second example, the groups will be checked in an unspecified order (since dictionaries are unordered), so which of the groups it returns is undefined. This is especially insidious since it might be the expected output in unit-tests, but this could change on a different platform or with a version update.

The underlying issue is that the new code doesn't communicate the intention clearly. If the expected result indeed is non-deterministic, then I think it should be clearly communicated in documentation, since this definitely breaks the "principle of least surprise".

If on the other hand you expect an x to appear in exactly one group, then I think the data structure should be changed to communicate this clearly, e.g.

def group(x):
    GROUPS = {
      'a': 'ALPHA', 
      'c': 'ALPHA',
      'd': 'ALPHA',
      'h': 'BETA', 
      'k': 'BETA', 
      'l': 'BETA'
    }
    return GROUPS[x]

This is also much simpler code, since the assumptions are encoded in the data structure rather than in iteration loops.

That said, I agree that it is a good idea to suggest readability improvements in code reviews. But this is really something you should discuss in your team.

For a code review, your goal should be to check if the code is acceptable. Could it be done better? That's a dubious reason for a change. Could it be done better in your opinion? That's a more dubious reason.

Do you prefer it better another way? That's no reason at all, because if you had written the code and I reviewed it, I would prefer it better my way. So the code always ends up different from the author's preference, creating useless extra work.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange