I am an advocate of commenting on source code and documenting software products. It is my personal experience and observation that working on source code that is rigorously commented has helped me in different ways when I have had to grow software or maintain it.
However there’s another camp that says commenting is ultimately worthless or its value is questionable. Numerous proponents of coding without commenting argue that:
- If a piece of code is well-written, it is self explanatory and hence does not need commenting
- If a piece of code is not self-explanatory, then refactor it and make it self-explanatory so that it does not need any comments
- Your test suite is your live documentation
- Over time code and comments get out of sync and it becomes another source of headaches
- Agile says working code is more important than piles of documentation, so we can safely ignore writing comments
To me this is just dogma. Again, my personal observation has been that software written by teams of smart and experienced developers ultimately end up with a considerable amount of code that is not self-explanatory.
Again, the Java API, Cocoa API, Android API, etc. show that if you want to write and maintain quality documentation, it is possible.
Having said all these, conversations about pros and cons of documentation and commenting on source code that are based on personal beliefs usually do not end well and lead to no satisfying conclusions.
As such I am looking for academic papers and empirical studies about the effects of software documentation, especially commenting source code, on its quality and maintainability as well as its effects on team productivity.
Have you stumbled upon such articles and what’s been the outcome of them, if any?
11
In “The effect of modularization and comments on program comprehension” (1981), Woodfield, Dunsmore, and Shen found that “subjects whose programs contained comments were able to answer more questions than those without comments.”
However, in “Learning a Metric for Code Readability” (2010), Raymond P.L. Buse and Westley Weimer found that comments have only a limited effect on readability and quality:
From the abstract:
We construct an automated readability measure and… show that this metric correlates strongly with three measures of software quality: code changes, automated defect reports,
and defect log messages… Our data suggests that comments, in of themselves, are less
important than simple blank lines to local judgments of readability.
From page 12:
We found that comments were are only moderately well-correlated
with our annotators’ notion of readability (33% relative
power). One conclusion may be that while comments
can enhance readability, they are typically used in code
segments that started out less readable: the comment
and the unreadable code effectively balance out. The
net effect would appear to be that comments are not
always, in and of themselves, indicative of high or low
readability.
Keep in mind that the “coding without commenting” proponents aren’t saying that code without comments is better than code with comments. They’re arguing that a particular style of code without comments – one that extracts code into methods with self-describing names, one that introduces explaining variables, one that has a good test suite – is better than code that doesn’t do those things but does have comments. This could complicate the applicability of any studies that have been done.
2