draper (12/24/82)
This is a follow-up to Tom Anderson's cnest that checked for nested comments. His effort inspired me to implement something I had been wanting to do for a long time. cchk (C program checker) is intended to run a fast pre-check on a C program to weed out certain errors that the compiler either gives poor diagnostics on, or ignores. 1) It checks all kinds of brackets for correct matching (including both kinds of quotes and comments). 2) It checks your indentation, which it uses to produce better guesses at what your underlying mistake was. 3) It looks out for 3 well-known traps in C: a) dangling elses -- an else gets attached to a different if from the one you intended. b) equality/assignment confusion -- you put '=' instead of '==' in a condition. c) nested comments -- if you nest comments, for instance if you comment out a section of code that already has comments, you will not get the effect you want. I have posted the source and a manual entry to net.sources. The latter gives more details, and also an argument about why I wrote it which I reproduce here: "cchk" was written as a result of the following observations: 1) In Unix, modularity suggests that it is appropriate to have different programs with different special expertise where other systems would cram them all into one program. Thus lint incoporates special knowledge about type-checking and portability considerations that would be inappropriate in a compiler. cchk like lint takes advantage of the fact that since it is not the compiler it can be wrong some of the time without preventing anyone from doing anything. 2) C has, in my opinion, some bad choices in its syntax that cause frequent errors by users. It turns out, though, that these can largely be checked for cheaply, which alleviates the original poor design choice. These are: a) Not supporting nested comments (nor warning about them in the compiler). b) Not having an "endif" (or "fi") closer to terminate if statements, thus leaving users open to the dangling else problem. (This is the problem that if you have nested if statements the following else will get bound to the nearest preceding one, which is not always the intuitively reasonable one.) This is especially troublesome, as it means among other things that if you modify a program by adding an else clause to an existing if statement, you may have to modify (by adding braces) not the if statement to which you are attaching the else, but a nested if statement acting as its "then" clause. c) The use of '=' for assignment, following Fortran's bad usage. It seems to be the case that both '=' and '==' get seen and mentally read as "equals" so that it is hard to spot if you write '=' for '==' in conditionals, an error that may happen either because of the language-promoted confusion itself, or because of a typing slip (which is then hard to spot). 3) The C compiler produces outstandingly unhelpful error messages as a rule, from the point of view of a user who wants to make corrections as fast as possible. Once past the beginner stage however, a user can usually do all right by ignoring the text of the error message, which almost never tells her/him what to correct, and attending to the line-number: generally when your attention is directed to only a line or two you can tell what is wrong. This breaks down when the compiler fails to generate anything like the helpful line number. This is usually however in cases of failure to match brackets of some sort -- something which is easy for another program to check. Furthermore attending to the user's indentation usually allows accurate diagnoses and helpful messages to be generated in just these cases. cchk, then, attempts to address these points largely by checking bracket matches and using indentation to guess what the real problem was -- whether a missing opener, a missing closer, wrong indentation, or some other mistake such as a spurious character. Like the compiler, it has only a fair chance of recovering after an error and commenting intelligently on the remaining code. However its relatively fast running time means that correcting only the first error in each cycle is not too time consuming. Steve Draper UCSD, San Diego ucbvax!sdcsvax!sdcsla!draper draper@nprdc