[net.unix-wizards] cchk - a new program to help weed out C syntax errors

draper (12/24/82)

This is a follow-up to Tom Anderson's cnest that checked for nested comments.
His effort inspired me to implement something I had been wanting to do for
a long time.  cchk (C program checker) is intended to run a fast pre-check
on a C program to weed out certain errors that the compiler either gives
poor diagnostics on, or ignores.
1)  It checks all kinds of brackets for correct matching (including both
	kinds of quotes and comments).
2)  It checks your indentation, which it uses to produce better guesses at
	what your underlying mistake was.
3)  It looks out for 3 well-known traps in C:
 a) dangling elses -- an else gets attached to a different if from the one
	you intended.
 b) equality/assignment confusion -- you put '=' instead of '==' in a
	condition.
 c) nested comments -- if you nest comments, for instance if you
	comment out a section of code that already has comments, you will
	not get the effect you want.

I have posted the source and a manual entry to net.sources.  The latter
gives more details, and also an argument about why I wrote it which I
reproduce here:

 	"cchk" was written as a result of the following observations:

 1)  In Unix, modularity suggests that it is appropriate to have different
programs with different special expertise where other systems would cram
them all into one program.  Thus lint incoporates special knowledge about
type-checking and portability considerations that would be inappropriate in
a compiler. cchk like lint takes advantage of the fact that since it is not
the compiler it can be wrong some of the time without preventing anyone
from doing anything.

 2)  C has, in my opinion, some bad choices in its syntax that cause
frequent errors by users.  It turns out, though, that these can largely be
checked for cheaply, which alleviates the original poor design choice.
These are:
 	a) Not supporting nested comments (nor warning about them in the
compiler).
 	b) Not having an "endif" (or "fi") closer to terminate if
statements, thus leaving users open to the dangling else problem.  (This is
the problem that if you have nested if statements the following else will
get bound to the nearest preceding one, which is not always the intuitively
reasonable one.)  This is especially troublesome, as it means among other
things that if you modify a program by adding an else clause to an existing
if statement, you may have to modify (by adding braces) not the if
statement to which you are attaching the else, but a nested if statement
acting as its "then" clause.
 	c) The use of '=' for assignment, following Fortran's bad usage.
It seems to be the case that both '=' and '==' get seen and mentally read
as "equals" so that it is hard to spot if you write '=' for '==' in
conditionals, an error that may happen either because of the
language-promoted confusion itself, or because of a typing slip (which is
then hard to spot).

 3) The C compiler produces outstandingly unhelpful error messages as a
rule, from the point of view of a user who wants to make corrections as
fast as possible.  Once past the beginner stage however, a user can usually
do all right by ignoring the text of the error message, which almost never
tells her/him what to correct, and attending to the line-number:  generally
when your attention is directed to only a line or two you can tell what is
wrong.  This breaks down when the compiler fails to generate anything like
the helpful line number.  This is usually however in cases of failure to
match brackets of some sort -- something which is easy for another program
to check.  Furthermore attending to the user's indentation usually allows
accurate diagnoses and helpful messages to be generated in just these cases.
cchk, then, attempts to address these points largely by checking bracket
matches and using indentation to guess what the real problem was -- whether
a missing opener, a missing closer, wrong indentation, or some other
mistake such as a spurious character.  Like the compiler, it has only a
fair chance of recovering after an error and commenting intelligently on
the remaining code.  However its relatively fast running time means that
correcting only the first error in each cycle is not too time consuming.

				Steve Draper
				UCSD, San Diego
				ucbvax!sdcsvax!sdcsla!draper   draper@nprdc