What is self-documenting code and can it replace well documented code? [closed]

https://stackoverflow.com/questions/209015

03-07-2019
|

Question

I have a colleague who insists that his code doesn't need comments, it's "self documenting."

I've reviewed his code, and while it's clearer than code which I've seen others produce, I still disagree that self-documenting code is as complete and useful as well commented and documented code.

Help me understand his point of view.

What is self documenting code
Can it really replace well commented and documented code
Are there situations where it's better than well documented and commented code
Are there examples where code cannot possibly be self-documenting without comments

Maybe it's just my own limitations, but I don't see how it can be a good practice.

This is not meant to be an argument - please don't bring up reasons why well commented and documented code is high priority - there are many resources showing this, but they aren't convincing to my peer. I believe I need to more fully understand his perspective to convince him otherwise. Start a new question if you must, but don't argue here.

Wow, quick response! Please read all the existing answers and provide comments to answers rather than add new answers, unless your answer really is substantially different from every other answer in here.

Also, those of you who are arguing against self documenting code -this is primarily to help me understand the perspective (ie, positive aspects) of self-documenting code evangelists. I expect others will downvote you if you don't stay on topic.

Solution

In my opinion, any code should be self-documenting. In good, self-documented code, you don't have to explain every single line because every identifier (variable, method, class) has a clear semantic name. Having more comments than necessary actually makes it harder (!) to read the code, so if your colleague

writes documentation comments (Doxygen, JavaDoc, XML comments etc.) for every class, member, type and method AND
clearly comments any parts of the code that are not self-documenting AND
writes a comment for each block of code that explains the intent, or what the code does on a higher abstraction level (i.e. find all files larger than 10 MB instead of loop through all files in a directory, test if file size is larger than 10 MB, yield return if true)

his code and documentation is fine, in my opinion. Note that self-documented code does not mean that there should be no comments, but only that there should be no unnecessary comments. The thing is, however, that by reading the code (including comments and documentation comments) should yield an immediate understanding of what the code does and why. If the "self-documenting" code takes longer to understand than commented code, it is not really self-documenting.

OTHER TIPS

Well, since this is about comments and code, let's look at some actual code. Compare this typical code:

float a, b, c; a=9.81; b=5; c= .5*a*(b^2);

To this self-documenting code, which shows what is being done:

const float gravitationalForce = 9.81;
float timeInSeconds = 5;
float displacement = (1 / 2) * gravitationalForce * (timeInSeconds ^ 2);

And then to this documented code, which better explains why it is being done:

/* compute displacement with Newton's equation x = vₒt + ½at² */
const float gravitationalForce = 9.81;
float timeInSeconds = 5;
float displacement = (1 / 2) * gravitationalForce * (timeInSeconds ^ 2);

And the final version of code as documentation with zero comments needed:

float computeDisplacement(float timeInSeconds) {
    const float gravitationalForce = 9.81;
    float displacement = (1 / 2) * gravitationalForce * (timeInSeconds ^ 2);
    return displacement;
}

Here's an example of a poor commenting style:

const float a = 9.81; //gravitational force
float b = 5; //time in seconds
float c = (1/2)*a*(b^2) //multiply the time and gravity together to get displacement.

In the last example, comments are used when variables should have been descriptively named instead, and the results of an operation are summarized when we can clearly see what the operation is. I would prefer the self-documented second example to this any day, and perhaps that is what your friend is talking about when he says self-documented code.

I would say that it depends on the context of what you are doing. To me, the self-documented code is probably sufficient in this case, but a comment detailing the methodology behind what is behind done (in this example, the equation) is also useful.

The code itself is always going to be the most up-to-date explanation of what your code does, but in my opinion it's very hard for it to explain intent, which is the most vital aspect of comments. If it's written properly, we already know what the code does, we just need to know why on earth it does it!

Someone once said

1) Only write comments for code that's hard to understand.
2) Try not to write code that's hard to understand.

The idea behind "self-documenting" code is that the actual program logic in the code is trivially clear enough to explain to anyone reading the code not only what the code is doing but why it is doing it.

In my opinion, the idea of true self-documenting code is a myth. The code can tell you the logic behind what is happening, but it can't explain why it is being done a certain way, particularly if there is more than one way to solve a problem. For that reason alone it can never replace well commented code.

I think it's relevant to question whether a particular line of code is self-documenting, but in the end if you do not understand the structure and function of a slice of code then most of the time comments are not going to help. Take, for example, amdfan's slice of "correctly-commented" code:

/* compute displacement with Newton's equation x = v0t + ½at^2 */
const float gravitationalForce = 9.81;
float timeInSeconds = 5;
float displacement = (1 / 2) * gravitationalForce * (timeInSeconds ^ 2);

This code is fine, but the following is equally informative in most modern software systems, and explicitly recognizes that using a Newtonian calculation is a choice which may be altered should some other physical paradigm be more appropriate:

const float accelerationDueToGravity = 9.81;
float timeInSeconds = 5;
float displacement = NewtonianPhysics.CalculateDisplacement(accelerationDueToGravity, timeInSeconds);

In my own personal experience, there are very few "normal" coding situations where you absolutely need comments. How often do you end up rolling your own algorithm, for example? Basically everything else is a matter of structuring your system so that a coder can comprehend the structures in use and the choices which drove the system to use those particular structures.

I forget where I got this from, but:

Every comment in a program is like an apology to the reader. "I'm sorry that my code is so opaque that you can't understand it by looking at it". We just have to accept that we are not perfect but strive to be perfect and go right on apologizing when we need to.

Self-documenting code is a good example of "DRY" (Don't Repeat Yourself). Don't duplicate information in comments which is, or can be, in the code itself.

Rather than explain what a variable is used for, rename the variable.

Rather than explain what a short snippet of code does, extract it into a method and give it a descriptive name (perhaps a shortened version of your comment text).

Rather than explain what a complicated test does, extract that into a method too and give it a good name.

Etc.

After this you end up with code that doesn't require as much explanation, it explains itself, so you should delete the comments which merely repeat information in the code.

This doesn't mean you have no comments at all, there is some information you can't put into the code such as information about intent (the "why"). In the ideal case the code and the comments complement each other, each adding unique explanatory value without duplicating the information in the other.

self-documenting code is a good practice and if done properly can easily convey the meaning of the code without reading too many comments. especially in situations where the domain is well understood by everyone in the team.

Having said that, comments can be very helpful for new comers or for testers or to generate documentation/help files.

self-documenting code + necessary comments will go a long way towards helping people across teams.

First of all, it's good to hear that your colleague's code is in fact clearer than other code you have seen. It means that he's probably not using "self-documenting" as an excuse for being too lazy to comment his code.

Self-documenting code is code that does not require free-text comments for an informed reader to understand what it is doing. For example, this piece of code is self-documenting:

print "Hello, World!"

and so is this:

factorial n = product [1..n]

and so is this:

from BeautifulSoup import BeautifulSoup, Tag

def replace_a_href_with_span(soup):
    links = soup.findAll("a")
    for link in links:
        tag = Tag(soup, "span", [("class", "looksLikeLink")])
        tag.contents = link.contents
        link.replaceWith(tag)

Now, this idea of an "informed reader" is very subjective and situational. If you or anyone else is having trouble following your colleague's code, then he'd do well to re-evaluate his idea of an informed reader. Some level of familiarity with the language and libraries being used must be assumed in order to call code self-documenting.

The best argument I have seen for writing "self-documenting code" is that it avoids the problem of free-text commentary not agreeing with the code as it is written. The best criticism is that while code can describe what and how it is doing by itself, it cannot explain why something is being done a certain way.

In order:

Self-documenting code is code that clearly expresses its intent to the reader.
Not entirely. Comments are always helpful for commentary on why a particular strategy was chosen. However, comments which explain what a section of code is doing are indicative of code that is insufficiently self-documenting and could use some refactoring..
Comments lie and become out of date. Code ~~always tells~~ is more likely to tell the truth.
I've never seen a case where the what of code couldn't be made sufficiently clear without comments; however, like I said earlier, it is sometimes necessary/helpful to include commentary on the why.

It's important to note, however, that truly self-documenting code takes a lot of self- and team-discipline. You have to learn to program more declaratively, and you have to be very humble and avoid "clever" code in favor of code that is so obvious that it seems like anyone could have written it.

For one, consider the following snippet:

/**
 * Sets the value of foobar.
 *
 * @foobar is the new vaue of foobar.
 */
 public void setFoobar(Object foobar) {
     this.foobar = foobar;
 }

In this example you have 5 lines of comments per 3 lines of code. Even worse - the comments do not add anything which you can't see by reading the code. If you have 10 methods like this, you can get 'comment blindness' and not notice the one method that deviates from the pattern.

If course, a better version would have been:

/**
 * The serialization of the foobar object is used to synchronize the qux task.
 * The default value is unique instance, override if needed.
 */
 public void setFoobar(Object foobar) {
     this.foobar = foobar;
 }

Still, for trivial code I prefer not having comments. The intent and the overall organization is better explained in a separate document outside of the code.

When you read a "self-documenting code", you see what it is doing, but you cannot always guess why it is doing in that particular way.

There are tons of non-programming constraints like business logic, security, user demands etc.

When you do maintenance, those backgorund information become very important.

Just my pinch of salt...

One thing that you may wish to point out to your colleague is that no matter how self-documenting his code is, if other alternate approaches were considered and discarded that information will get lost unless he comments the code with that information. Sometimes it's just as important to know that an alternative was considered and why it was decided against and code comments are most likely to survive over time.

Have you heard of Donald Knuth's "WEB" project to implement his Literate Programming concept? It's more than self-documenting code; it's more like documentation that can be compiled and executed as code. I don't know how much it is used today though.

The difference is between "what" and "how".

You should document "what" a routine does.
You should not document "how" it does it, unless special cases (e.g. refer to a specific algorithm paper). That should be self-documented.

In a company where I worked one of the programmers had the following stuck to the top of her monitor.

"Document your code like the person who maintains it is a homocidal maniac who knows where you live."

The point of view that code is self documenting drives me crazy. A particular line of code or a sub algorithm may be indeed self documenting but it's purpose in the greater picutre simply is not.

I got so frustrated with this a month or two ago I wrote an entire blog post describing my point of view. Post here.

self-documenting code normally uses variable names that match exactly what the code is doing so that it is easy to understand what is going on

However, such "self-documenting code" will never replace comments. Sometimes code is just too complex and self-documenting code is not enough, especially in the way of maintainability.

I once had a professor who was a firm believer in this theory In fact the best thing I ever remember him saying is "Comments are for sissies"
It took all of us by surprise at first but it makes sense.
However, the situation is that even though you may be able to understand what is going on in the code but someone who is less experienced that you may come behind you and not understand what is going on. This is when comments become important. I know many times that we do not believe they are important but there are very few cases where comments are unnecessary.

I'm surprised that nobody has brought about "Literate Programming", a technique developed in 1981 by Donald E. Knuth of TeX and "The Art of Computer Programming" fame.

The premise is simple: since the code has to be understood by a human and comments are simply thrown away by the compiler, why not give everyone the thing they need - a full textual description of the intent of the code, unfettered by programming language requirements, for the human reader and pure code for the compiler.

Literate Programming tools do this by giving you special markup for a document that tells the tools what part should be source and what is text. The program later rips the source code parts out of the document and assembles a code file.

I found an example on the web of it: http://moonflare.com/code/select/select.nw or the HTML version http://moonflare.com/code/select/select.html

If you can find Knuth's book on it in a library (Donald E. Knuth, Literate Programming, Stanford, California: Center for the Study of Language and Information, 1992, CSLI Lecture Notes, no. 27.) you should read it.

That's self-documenting code, complete with reasoning and all. Even makes a nice document, Everything else is just well written comments :-)

My view is written in this post:

The one single tip to document your code.

Excerpt:

Instead of writing a lot of comments to explain the subtle behaviors of your program, why not restructure your logics so that they are self-evident? Instead of documenting what a method is doing, why not choose a clear name for that method? Instead of tagging your code to indicate unfinished work, why not just throw an NotImplementedException()? Instead of worrying whether your comments sound polite enough to your boss, your colleagues or anyone reading the code, why not just stop worrying by not writing them at all?

The clearer your code is, the easier it is to maintain it, to extend it, to work on it on future editions. The less ordorous is your code, the less need there is to comment it. The more the comments, the higher the maintanence cost.

I would like to offer one more perspective to the many valid answers:

What is source code? What is a programming language?

The machines don't need source code. They're happy running assembly. Programming languages are for our benefit. We don't want to write assembly. We need to understand what we are writing. Programming is about writing code.

Should you be able to read what you write?

Source code is not written in human language. It has been tried (for example FORTRAN) but it isn't completely successful.

Source code can't have ambiguity. That's why we have to put more structure in it than we do with text. Text only works with context, which we take for granted when we use text. Context in source code is always explisit. Think "using" in C#.

Most programming languages have redundancy so that the compiler can catch us when we aren't coherent. Other languages use more inference and try to eliminate that redundancy.

Type names, method names and variable names are not needed by the computers. They are used by us for referencing. The compiler doesn't understand semantics, that's for us to use.

Programming languages are a linguistic bridge between man and machine. It has to be writable for us and readable for them. Secondary demands are that it should be readable to us. If we are good at semantics where allowed and good at structuring the code, source code should be easy to read even for us. The best code doesn't need comments.

But complexity lurks in every project, you always have to decide where to put the complexity, and which camels to swallow. Those are the places to use comments.

Self documenting code is an easy opt out of the problem, that over time code, comment and documentation diverge. And it is a disciplining factor to write clear code (if you are that strict on yourself).

For me, these are the rules I try to follow:

Code should be as easy and clear to read as possible.
Comments should give reasons for design decisions I took, like: why do I use this algorithm, or limitations the code has, like: does not work when ... (this should be handled in a contract/assertion in the code) (usually within the function/procedure).
Documentation should list usage (calling converntions), side effects, possible return values. It can be extracted from code using tools like jDoc or xmlDoc. It therefore usually is outside the function/procedure, but close to the code it describes.

This means that all three means of documenting code live close together and therefore are more likely to be changed when the code changes, but do not overlap in what they express.

The real problem with the so-called self-documenting code is that it conveys what it actually does. While some comments may help someone understand the code better (e.g., algorithms steps, etc.) it is to a degree redundant and I doubt you would convince your peer.

However, what is really important in documentation is the stuff that is not directly evident from the code: underlying intent, assumptions, impacts, limitations, etc.

Being able to determine that a code does X from a quick glance is way easier than being able to determine that a code does not do Y. He has to document Y...

You could show him an example of a code that looks well, is obvious, but doesn't actually cover all the bases of the input, for example, and see if he finds it.

I think that self-documenting code is a good replacement for commenting. If you require comments to explain how or why code is the way it is, then you have a function or variable names that should be modified to be more explanatory. It can be down to the coder as to whether he will make up the shortfall with a comment or renaming some variables and functions and refactoring code though.

It can't really replace your documentation though, because documentation is what you give to others to explain how to use your system, rather than how it does things.

Edit: I (and probably everyone else) should probably have the provision that a Digital Signal Processing (DSP) app should be very well commented. That's mainly because DSP apps are essentially 2 for loops fed with arrays of values and adds/multiplies/etc said values... to change the program you change the values in one of the arrays... needs a couple of comments to say what you are doing in that case ;)

When writing mathematical code, I have sometimes found it useful to write long, essay-like comments, explaining the math, the notational conventions the code uses, and how it all fits together. We're talking hundreds of lines of documentation, here.

I try to make my code as self-documenting as possible, but when I come back to work on it after a few months, I really do need to read the explanation to keep from making a hash out of it.

Now, of course this kind of extreme measure isn't necessary for most cases. I think the moral of the story is: different code requires different amounts of documentation. Some code can be written so clearly that it doesn't need comments -- so write it that clearly and don't use comments there!

But lots of code does need comments to make sense, so write it as clearly as possible and then use as many comments as it needs...

I would argue - as many of you do - that to be truly self documenting, code needs to show some form of intent. But I'm surprised nobody mentioned BDD yet - Behavior Driven Development. Part of the idea is that you have automated tests (code) explaining the intent of your code, which is so difficult to make obvious otherwise.

Good domain modeling 
+ good names (variabes, methods, classes) 
+ code examples (unit tests from use cases) 
= self documenting software

A couple of reasons why extra comments in addition to the code might be clearer:

The code you're looking at was generated automatically, and hence any edits to the code might be clobbered the next time the project is compiled
A less-than-straightforward implementation was traded off for a performance gain (unrolling a loop, creating a lookup table for an expensive calculation, etc.)

Its going to be all in what the team values in its documentation. I would suggest that documenting why/intent instead of how is important and this isn't always captured in self documenting code. get/set no these are obvious - but calculation, retrieval etc something of the why should be expressed.

Also be aware of difference in your team if you are comming from different nationalities. Differences in diction can creap into the naming of methods:

BisectionSearch

BinarySearch

BinaryChop

These three methods contributed from developers trained on 3 different continents do the same thing. Only by reading the comments that described the algorithm were we able to identify the duplication in our library.

For me reading code that needs comments is like reading text in the language I do not know. I see statement and I do not understand what it does or why - and I have to look at comments. I read a phrase and I need to look in dictionary to understand what it means.

It is usually easy to write code that self-documents what it does. To tell you why it does so comments are more suitable, but even here code can be better. If you understand your system on every level of abstraction, you should try organizing you code like

public Result whatYouWantToDo(){
  howYouDoItStep1();
  howYouDoItStep2();
  return resultOfWhatYouHavDone;
}

Where method name reflects your intent and method body explains how you achieve your goal. You anyway can not tell entire book in its title, so main abstractions of your system still have to be documented, as well as complex algorithms, non-trivial method contracts and artifacts.

If the code that your colleague produc is really self-documented - lucky you and him. If you think that your colleagues code needs comments - it needs. Just open the most non-trivial place in it, read it once and see if you understood everything or not. If the code is self-documented - then you should. If not - ask your colleague a question about it, after he gives you an answer ask why that answer was not documented in comments or code beforehand. He can claim that code is self-document for such smart person as him, but he anyway has to respect other team members - if your tasks require understanding of his code and his code does not explain to you everything you need to understand - it needs comments.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow