markh@csd4.milw.wisc.edu (Mark William Hopkins) (08/16/88)
This is a new term to describe the experience I've had to go through in going through several dozen kilobytes of programs over the last few days. I would like to get my hands on the guy who said that you can never put too many comments in your code. What I've found is that comments are often used to compensate for bad programming in the following ways: (1) Bad variable naming (too cryptic or named in a way unrelated to its use.) -- thus making those otherwise unnecessary comments on the variables use necessary. The rule of thumb is: If you've found it necessary to comment on the use of a variable then you've given it the wrong name. (2) Hyfalutenated code, like this: if A = 0 then Flag := true else Flag := false {Flag is the condition that A is 0.} Just write Flag := (A = 0) and let the code speak for itself (or better yet, don't use a flag). There's other examples more tangled than this, but this is the first and foremost one that comes to mind. (3) Bad formatting, like this: while I < 0 do begin for I := 1 to 6 do begin read(A[I]) end; {for loop} readln; end {while loop} I took a small example here, but imagine that this were a huge nested loop. Imagine how much the formatting is compounding the problem of recognition where all you are really saying is this: while I < 0 do begin for I := 1 to 6 do read(A[I]); readln end It makes it utterly unnecessary to indicate what your "ends" are going with because you can see it clearly when the code is short enough for the "begin" and "end" to occupy the same page. This gets me to the second rule of thumb: If you find it necessary to make extensive comments on your program text, then you've written the program the wrong way. Again, let the code speak for itself and keep it short and simple. (4) Bad use of variables I've seen stuff like this: program ... ... <3658 variable declarations> x: integer; {local variable used in the procedure spfxdz.} r: integer; {a counter.} ... (A) Look at x. If you're going to use it as a local variable then declare it as such and don't clutter the main program with 3658 variables. (B) Counters should be declared to be of a subrange type, e.g. r: 1..StringSIZE ... after all, what else are subranges for? The fact that you declare it this way makes the comment unnecessary and makes the program more direct and transparent. There's others, but the general pattern is apparent: comments are being used extensively to compensate for bad programming. One of the most recent programs I modified was about 9k and 450 lines long. Now when you write long programs, a good sense of style will compel you to add a lot of comments. That's the problem. The program is already long enough as is, the extra pollution just makes it all the more obscure. I finally got that program down to 100 lines and 2k. There were still a couple (appropriate) comments in it, but nothing like the cancer I had to weed out. And that's the point: keep your code short, simple and direct and let it (or I should say: GET IT TO ...) speak for itself.
art@buengc.BU.EDU (A. R. Thompson) (08/17/88)
In article <6508@uwmcsd1.UUCP> markh@csd4.milw.wisc.edu (Mark William Hopkins) writes: > >The rule of thumb is: > If you've found it necessary to comment on the use of a variable > then you've given it the wrong name. This does not always work a neatly as we would like. Once, several years ago, I and a colleague were working on a Pascal compiler that was to implement the (then) proposed standard. Instead of starting from scratch, we modified an existing compiler. There was a boolean variable spelled "ttyinuse". Upon reading the code we discovered that when the variable "ttyinuse" was true the tty was NOT in use! I'm not arguing in favor of excessive comments. I am however saying that assigning "meaningful" spellings to identifiers can be very tricky. >There's other examples more tangled than this, but this is the first and >foremost one that comes to mind. > > (3) Bad formatting, like this: > > while I < 0 do > begin > for I := 1 to 6 do > begin > read(A[I]) > end; {for loop} > readln; > end {while loop} > >I took a small example here, but imagine that this were a huge nested loop. >Imagine how much the formatting is compounding the problem of recognition >where all you are really saying is this: > > while I < 0 do begin > for I := 1 to 6 do read(A[I]); > readln > end > >It makes it utterly unnecessary to indicate what your "ends" are going with >because you can see it clearly when the code is short enough for the >"begin" and "end" to occupy the same page. Er, this is just plain wrong code. From Jensen and Wirth, "Pascal User Manual and Report" 3rd ed. revised by Mickel and Miner, p. 39: "The control variable is left undefined upon normal exit from the for statement." That means that the variable "I" cannot legally be used as it is in this example as it is undefined when it is used in the test of the while statement. Not all compilers catch this, though it can be done. > >(B) Counters should be declared to be of a subrange type, e.g. > >r: 1..StringSIZE > >... after all, what else are subranges for? The fact that you declare it >this way makes the comment unnecessary and makes the program more direct >and transparent. Hear hear! Try this as a suggested improvement: const mincounter=1; stringsize=whatehaveyou; type counterrange=mincounter..stringsize; var r:conterrange; This allows you to change the counter range by changing only the value of a constant and buys you the advantages of named types. A subtle problem occurs when you use type integer willy-nilly when subranges should be used. The code for a for statement usually increments the control variable and then tests to see if the final value has been exceeded. If the control variable assumes the value maxint this incrementation will cause overflow. So, the compiler must generate extra code to be sure maxint will not be exceeded prior to the increment and test step. That test adds at least two instructions to each pass through the loop. If the loop is only executed a few times, it's no big deal, but if the loop is executed thousands of times (not all that rare in real programs) then many extra instructions are executed when a little careful programming would avoid them. > >I finally got that program down to 100 lines and 2k. There were still a >couple (appropriate) comments in it, but nothing like the cancer I had >to weed out. And that's the point: keep your code short, simple and direct >and let it (or I should say: GET IT TO ...) speak for itself. Again, hear, hear. There is nothing like well written programs. Well written programs are (usually) self documenting. Save comments for those parts of the code that are difficult to follow because the solution is inherently difficult to follow. Remember, bad comments can be just as misleading as bad code.
stuf@freja.dk (Kristoffer H. Holm) (08/19/88)
In article <840@buengc.BU.EDU> art@buengc.BU.EDU (A. R. Thompson) writes: >... >we modified an existing compiler. There was a boolean variable spelled >"ttyinuse". Upon reading the code we discovered that when the variable >"ttyinuse" was true the tty was NOT in use! >I'm not arguing in favor of excessive comments. I am however saying that >assigning "meaningful" spellings to identifiers can be very tricky. Why, it should have been named tty_not_in_use (or TtyNotInUse if you prefer that)! In my opinion, properly written Pascal programs only need intensional comments, stating the purpose of chunks of code -- the primary purpose of comments is to allow the reader/maintenance programmer to SKIP code that is irrelevant to him/her! In the example a comment would have done no good (especially not a misleading one)! >... >Er, this is just plain wrong code. From Jensen and Wirth, "Pascal User >Manual and Report" 3rd ed. revised by Mickel and Miner, p. 39: "The >control variable is left undefined upon normal exit from the for >statement." That means that the variable "I" cannot legally be used as it >is in this example as it is undefined when it is used in the test of the >while statement. Not all compilers catch this, though it can be done. Right. While we're at it: variables used as counters in for loops must also be local to the enclosing <block> (program/procedure/function body) of the loop, making "{loop variable in procedure xxxx}" comments dubious. >Try this as a suggested improvement: >const > mincounter=1; > stringsize=whatehaveyou; >type > counterrange=mincounter..stringsize; >var > r:conterrange; >This allows you to change the counter range by changing only the value of >a constant and buys you the advantages of named types. In general I don't agree: you use the wrong principle. Constants and Types shall only be named if they can be misunderstood, or are used in more than one place. And it is OK to have the constants 0 and 1 in ranges. Otherwise you commit a sin similar to over-commenting, and use to many different names. In the above example, type string_index = 1..whatever; var r: stringindex; is more readable---note that "string_index" is a better type name than "counterrange": it shows the intension of the type, not its applications! Of course, "r" should be called something else (not apparent from the example). Regards, Kristoffer. --------------------------------------------------------------------------- Kristoffer H. Holm eunet: stuf@diku.dk Institute of Datalogy (1. sal N), Univ. of Copenhagen, DK-2100 Copenhagen East, DENMARK ---------------------------------------------------------------------------
art@buengc.BU.EDU (A. R. Thompson) (08/31/88)
In article <3987@freja.dk> stuf@freja.dk (Kristoffer H. Holm) writes: >In article <840@buengc.BU.EDU> art@buengc.BU.EDU (A. R. Thompson) writes: >>... >>we modified an existing compiler. There was a boolean variable spelled >>"ttyinuse". Upon reading the code we discovered that when the variable >>"ttyinuse" was true the tty was NOT in use! >>I'm not arguing in favor of excessive comments. I am however saying that >>assigning "meaningful" spellings to identifiers can be very tricky. > >Why, it should have been named tty_not_in_use (or TtyNotInUse if you prefer >that)! In my opinion, properly written Pascal programs only need >intensional comments, stating the purpose of chunks of code -- the primary >purpose of comments is to allow the reader/maintenance programmer to SKIP >code that is irrelevant to him/her! In the example a comment would have >done no good (especially not a misleading one)! My point was, and is, that assigning "meaningful" spellings to identifiers can be misleading. Sometimes we think that the spelling is actually denoting the meaning when in reality its meaning is only valid in the context of the programming language. > >>This allows you to change the counter range by changing only the value of >>a constant and buys you the advantages of named types. > >In general I don't agree: you use the wrong principle. Constants and >Types shall only be named if they can be misunderstood, or are used in >more than one place. And it is OK to have the constants 0 and 1 in ranges. >Otherwise you commit a sin similar to over-commenting, and use to many >different names. In the above example, > > type string_index = 1..whatever; > var r: stringindex; > >is more readable---note that "string_index" is a better type name than >"counterrange": it shows the intension of the type, not its applications! >Of course, "r" should be called something else (not apparent from the >example). This is wholly a matter of style. The caveat of only naming constants etc if they are subject to use in more than one place assumes the program will not be modified in the future to require such multiple usage. Again, it's really a matter of style, it's just that I make it a habit to use such a style which gives me, I claim, more flexibility at low cost.