[comp.lang.pascal] Comment Pollution

markh@csd4.milw.wisc.edu (Mark William Hopkins) (08/16/88)

     This is a new term to describe the experience I've had to go through
in going through several dozen kilobytes of programs over the last few days.
I would like to get my hands on the guy who said that you can never put too
many comments in your code.

     What I've found is that comments are often used to compensate for bad
programming in the following ways:

     (1) Bad variable naming (too cryptic or named in a way unrelated to its 
	 use.) -- thus making those otherwise unnecessary comments on the
	 variables use necessary.

The rule of thumb is:
	 If you've found it necessary to comment on the use of a variable
	 then you've given it the wrong name.

     (2) Hyfalutenated code, like this:

	 if A = 0 then Flag := true else Flag := false
	 {Flag is the condition that A is 0.}

	 Just write
	      Flag := (A = 0)
         and let the code speak for itself (or better yet, don't use a flag).
      
There's other examples more tangled than this, but this is the first and 
foremost one that comes to mind.

     (3) Bad formatting, like this:

	 while I < 0 do
	    begin
	       for I := 1 to 6 do
		 begin
                   read(A[I])
                 end; {for loop}
                readln;
            end {while loop}

I took a small example here, but imagine that this were a huge nested loop.
Imagine how much the formatting is compounding the problem of recognition
where all you are really saying is this:

	 while I < 0 do begin
	    for I := 1 to 6 do read(A[I]);
	    readln
         end

It makes it utterly unnecessary to indicate what your "ends" are going with
because you can see it clearly when the code is short enough for the
"begin" and "end" to occupy the same page.

This gets me to the second rule of thumb: If you find it necessary to make
extensive comments on your program text, then you've written the program
the wrong way. Again, let the code speak for itself and keep it short and 
simple.

     (4) Bad use of variables

I've seen stuff like this:

program ...
...
<3658 variable declarations>
x: integer; {local variable used in the procedure spfxdz.}
r: integer; {a counter.}
...

(A) Look at x.  If you're going to use it as a local variable then declare it
as such and don't clutter the main program with 3658 variables.

(B) Counters should be declared to be of a subrange type, e.g.

r: 1..StringSIZE

... after all, what else are subranges for?  The fact that you declare it
this way makes the comment unnecessary and makes the program more direct
and transparent.

There's others, but the general pattern is apparent: comments are being
used extensively to compensate for bad programming.

One of the most recent programs I modified was about 9k and 450 lines long.
Now when you write long programs, a good sense of style will compel you
to add a lot of comments.  That's the problem.  The program is already
long enough as is, the extra pollution just makes it all the more obscure.

I finally got that program down to 100 lines and 2k.  There were still a
couple (appropriate) comments in it, but nothing like the cancer I had
to weed out.  And that's the point: keep your code short, simple and direct 
and let it (or I should say: GET IT TO ...) speak for itself.

art@buengc.BU.EDU (A. R. Thompson) (08/17/88)

In article <6508@uwmcsd1.UUCP> markh@csd4.milw.wisc.edu (Mark William Hopkins) writes:
>
>The rule of thumb is:
>	 If you've found it necessary to comment on the use of a variable
>	 then you've given it the wrong name.

This does not always work a neatly as we would like.  Once, several years
ago, I and a colleague were working on a Pascal compiler that was to
implement the (then) proposed standard.  Instead of starting from scratch,
we modified an existing compiler.  There was a boolean variable spelled
"ttyinuse".  Upon reading the code we discovered that when the variable
"ttyinuse" was true the tty was NOT in use!

I'm not arguing in favor of excessive comments.  I am however saying that
assigning "meaningful" spellings to identifiers can be very tricky.

>There's other examples more tangled than this, but this is the first and 
>foremost one that comes to mind.
>
>     (3) Bad formatting, like this:
>
>	 while I < 0 do
>	    begin
>	       for I := 1 to 6 do
>		 begin
>                   read(A[I])
>                 end; {for loop}
>                readln;
>            end {while loop}
>
>I took a small example here, but imagine that this were a huge nested loop.
>Imagine how much the formatting is compounding the problem of recognition
>where all you are really saying is this:
>
>	 while I < 0 do begin
>	    for I := 1 to 6 do read(A[I]);
>	    readln
>         end
>
>It makes it utterly unnecessary to indicate what your "ends" are going with
>because you can see it clearly when the code is short enough for the
>"begin" and "end" to occupy the same page.

Er, this is just plain wrong code.  From Jensen and Wirth, "Pascal User
Manual and Report" 3rd ed. revised by Mickel and Miner, p. 39:  "The
control variable is left undefined upon normal exit from the for
statement."  That means that the variable "I" cannot legally be used as it
is in this example as it is undefined when it is used in the test of the
while statement.  Not all compilers catch this, though it can be done.

>
>(B) Counters should be declared to be of a subrange type, e.g.
>
>r: 1..StringSIZE
>
>... after all, what else are subranges for?  The fact that you declare it
>this way makes the comment unnecessary and makes the program more direct
>and transparent.

Hear hear!

Try this as a suggested improvement:

const
   mincounter=1;
   stringsize=whatehaveyou;

type
   counterrange=mincounter..stringsize;

var
   r:conterrange;

This allows you to change the counter range by changing only the value of
a constant and buys you the advantages of named types.

A subtle problem occurs when you use type integer willy-nilly when
subranges should be used.  The code for a for statement usually increments
the control variable and then tests to see if the final value has been
exceeded.  If the control variable assumes the value maxint this
incrementation will cause overflow.  So, the compiler must generate extra
code to be sure maxint will not be exceeded prior to the increment and
test step.  That test adds at least two instructions to each pass through
the loop.  If the loop is only executed a few times, it's no big deal, but
if the loop is executed thousands of times (not all that rare in real
programs) then many extra instructions are executed when a little careful
programming would avoid them.

>
>I finally got that program down to 100 lines and 2k.  There were still a
>couple (appropriate) comments in it, but nothing like the cancer I had
>to weed out.  And that's the point: keep your code short, simple and direct 
>and let it (or I should say: GET IT TO ...) speak for itself.

Again, hear, hear.  There is nothing like well written programs.  Well
written programs are (usually) self documenting.  Save comments for those
parts of the code that are difficult to follow because the solution is
inherently difficult to follow.  Remember, bad comments can be just as
misleading as bad code.

stuf@freja.dk (Kristoffer H. Holm) (08/19/88)

In article <840@buengc.BU.EDU> art@buengc.BU.EDU (A. R. Thompson) writes:
>...
>we modified an existing compiler.  There was a boolean variable spelled
>"ttyinuse".  Upon reading the code we discovered that when the variable
>"ttyinuse" was true the tty was NOT in use!
>I'm not arguing in favor of excessive comments.  I am however saying that
>assigning "meaningful" spellings to identifiers can be very tricky.

Why, it should have been named tty_not_in_use (or TtyNotInUse if you prefer
that)!  In my opinion, properly written Pascal programs only need 
intensional comments, stating the purpose of chunks of code -- the primary
purpose of comments is to allow the reader/maintenance programmer to SKIP
code that is irrelevant to him/her!  In the example a comment would have
done no good (especially not a misleading one)!

>...
>Er, this is just plain wrong code.  From Jensen and Wirth, "Pascal User
>Manual and Report" 3rd ed. revised by Mickel and Miner, p. 39:  "The
>control variable is left undefined upon normal exit from the for
>statement."  That means that the variable "I" cannot legally be used as it
>is in this example as it is undefined when it is used in the test of the
>while statement.  Not all compilers catch this, though it can be done.

Right. While we're at it: variables used as counters in for loops must
also be local to the enclosing <block> (program/procedure/function body)
of the loop, making "{loop variable in procedure xxxx}" comments dubious.

>Try this as a suggested improvement:

>const
>   mincounter=1;
>   stringsize=whatehaveyou;

>type
>   counterrange=mincounter..stringsize;

>var
>   r:conterrange;

>This allows you to change the counter range by changing only the value of
>a constant and buys you the advantages of named types.

In general I don't agree: you use the wrong principle.  Constants and
Types shall only be named if they can be misunderstood, or are used in
more than one place.  And it is OK to have the constants 0 and 1 in ranges.
Otherwise you commit a sin similar to over-commenting, and use to many
different names. In the above example,

	type	string_index = 1..whatever;
	var	r: stringindex;

is more readable---note that "string_index" is a better type name than
"counterrange": it shows the intension of the type, not its applications!
Of course, "r" should be called something else (not apparent from the
example).

Regards, Kristoffer.

---------------------------------------------------------------------------
Kristoffer H. Holm					eunet: stuf@diku.dk
Institute of Datalogy (1. sal N), Univ. of Copenhagen,
DK-2100 Copenhagen East, DENMARK
---------------------------------------------------------------------------

art@buengc.BU.EDU (A. R. Thompson) (08/31/88)

In article <3987@freja.dk> stuf@freja.dk (Kristoffer H. Holm) writes:
>In article <840@buengc.BU.EDU> art@buengc.BU.EDU (A. R. Thompson) writes:
>>...
>>we modified an existing compiler.  There was a boolean variable spelled
>>"ttyinuse".  Upon reading the code we discovered that when the variable
>>"ttyinuse" was true the tty was NOT in use!
>>I'm not arguing in favor of excessive comments.  I am however saying that
>>assigning "meaningful" spellings to identifiers can be very tricky.
>
>Why, it should have been named tty_not_in_use (or TtyNotInUse if you prefer
>that)!  In my opinion, properly written Pascal programs only need 
>intensional comments, stating the purpose of chunks of code -- the primary
>purpose of comments is to allow the reader/maintenance programmer to SKIP
>code that is irrelevant to him/her!  In the example a comment would have
>done no good (especially not a misleading one)!

My point was, and is, that assigning "meaningful" spellings to identifiers
can be misleading.  Sometimes we think that the spelling is actually
denoting the meaning when in reality its meaning is only valid in the
context of the programming language.

>
>>This allows you to change the counter range by changing only the value of
>>a constant and buys you the advantages of named types.
>
>In general I don't agree: you use the wrong principle.  Constants and
>Types shall only be named if they can be misunderstood, or are used in
>more than one place.  And it is OK to have the constants 0 and 1 in ranges.
>Otherwise you commit a sin similar to over-commenting, and use to many
>different names. In the above example,
>
>	type	string_index = 1..whatever;
>	var	r: stringindex;
>
>is more readable---note that "string_index" is a better type name than
>"counterrange": it shows the intension of the type, not its applications!
>Of course, "r" should be called something else (not apparent from the
>example).

This is wholly a matter of style.  The caveat of only naming constants etc
if they are subject to use in more than one place assumes the program will
not be modified in the future to require such multiple usage.  Again, it's
really a matter of style, it's just that I make it a habit to use such a
style which gives me, I claim, more flexibility at low cost.