[net.lang.mod2] Proofs &c

chris@umcp-cs.UUCP (Chris Torek) (07/09/86)
In article <12221138454.23.PATTIS@WASHINGTON.ARPA> PATTIS@WASHINGTON.ARPA
(Richard Pattis) writes:
>For introductory programming classes, I try to emphasize the teaching
>of critical thinking.  I attempt teach techniques for trying to
>prove that code is incorrect (such an activity may be labeled as
>aggressive debugging - even in the absence of manifest bugs).

This can be done `on line' as well: it is good practice when testing
code to try to provoke any `boundary conditions' or unusual code
paths, as such bugs tend to show up less often, but more dramatically
(or drastically), simply because the program is `time-tested' and
now `can be relied upon'.

>Mathematically proving and disproving code are identical,

A nit: not really.  Disproving can be done with a single counterexample,
but a lack of counterexamples does not constitute a proof.

>but in training future programmers, I would much prefer that my
>students say, "I haven't found anything wrong with my code yet,"
>to the all too frequently heard, "It's correct."

A proof of correctness does not mean that a piece of code works,
for the proof can be wrong or for some reason inapplicable, so
I think this is indeed a good thing.  It is, however, easy to
fall into the trap of thinking `no one has found anything wrong
with it yet, so it must be right'.  A few embarrassing errors
can do wonders for one's humility:

>Of course, it's easier to spot errors in someone else's code, so
>one good way to teach this skill is to ask students to analyze code
>written by others (known to be wrong) and find any errors.

You might also have students analyse each other's code.

I wonder, though, if it is indeed easier to spot errors in another's
code, or whether there is just more incentive for that than for
finding one's own errors.  On occasion I will look at some of my
own old code and note `obvious' mistakes.  If everyone did this we
might have much better code available, but on the other hand we
would probably have much less code as well, though on the third
hand (first foot?) that might not be so bad after all :-).

>On formalism: I believe that formalism is useful only when it can
>be applied to extend what we can do, or accurately check what we
>have done. Formalism for its own sake is pedantry.  For example,
>I advocate the use of simple control structures such as Pascal's,
>which subscribe to the 1-in/1-out rule.  Yet in Modula-2 we may
>violate this rule by using LOOP with EXIT and subroutines that use
>RETURN.  I use these features to simply implement simple algorithms:
>often such code is easier to understand, because it omits the
>introduction and use of extraneous state variables and other
>circuitous control flow.

Indeed.  What I cannot understand is why anyone thinks that EXIT
and RETURN statements are bad.  It is trivial to come up with an
EXIT axiom, even if it requires the introduction of a pseudo-variable
in the formal proof.  (Simply mechanically insert a boolean in the
loop, and an `if' around the part of the loop that follows the
EXIT.  The code constructs are equivalent, so the formal constructs
are as well.)

>Some comments on these solutions: Search1 uses two RETURN statements,
>one in the FOR loop. Many programmers consider the premature RETURN
>from the FOR loop to be bad programming practice.

Again, I ask:  *Why*?  An argument in favour of RETURN, by _reductio
ad absurdum_:  Integer variables are bad because the actual variable
and the formal description do not match exactly, for computer
integers, unlike formals, can overflow and become negative or cause
a runtime exception.  Because computer integers are hard to describe
formally, using integers is bad programming practice.

[Incidentally, the introductory programming course at the University
of Maryland actually *does* avoid integers.  For their first year,
students use something called `CF Pascal', in which the only legal
data types are characters and files.  The students are supposed to
be able to prove their programs correct, and having few data types
(one) and `Turing-tape-like' stores (files) helps keep the proof
complexity down.  I do not believe that these restrictions are
sensible beyond the stage of learning to write and prove algorithms,
and I am not even certain they are sensible there.  It does provoke
novel solutions to old problems, however.]

>... now remove the ability to perform short-circuit boolean
>evaluation (SCBE) and rewrite this function in Pascal.  Some
>programmers may argue that SCBE should be disallowed for similar
>reasons of formalilty, because it doesn't support the intuitive
>semantics for boolean expressions (commutivity); in the example
>above, we must very carefully select our post-loop IF test.

Why is commutivity `intuitive' here?  Having used SCBE's for years,
I find it very intuitive; indeed, at least one of the last few Pascal
programs I wrote broke because of code like the following:

	if (n <> 0) and ((x mod n) = 0) then ...

I had forgotten that Pascal semantics implied commutivity, and had
taken advantage of a nonexistent feature.  The error was easily
corrected, but this does show that a commutative mindset is by no
means universal.  And again, it is not hard to formally express a
short-circuit evaluation, though in conventional formalism the
proof itself will `suggest' a series of IFs.

> PROCEDURE Search2 (VAR A   : ARRAY OF INTEGER;	      (* Efficiency *)
>  		     Key     : INTEGER;
>		     Default : INTEGER)           : INTEGER;
>  VAR I,AnswerIndex : INTEGER;
>  BEGIN 
>    AnswerIndex:= Default;
>    I:= 0;
>    WHILE (I <= HIGH(A)) AND (AnswerIndex = Default) DO
>      IF A[I] = Key
>        THEN AnswerIndex:= I END;	 (* Found first I: A[I]  = Key      *)
>      I:= I+1
>    END;
>    RETURN AnswerIndex
>  END Search2;

Curious: if the Default answer is a legal subscript for A, and the
desired value appears in A at that point and also at a later point,
this routine returns the later point.  For example, if A contains
the values (4, 7, 2, 7, 3), Key is 7, and Default is 1, this function
returns 3.  Perhaps this is the `subtlety' you mention (I thought
it fairly obvious).

[N.B.: The following points were quoted for the purpose of refutation;
I am *agreeing* with Pattis below.]

>Here again I have abstracted some quotations from Sale's book
>(page 401). ...
>
>  "(1) Program fragments containing EXIT statements or unusually
>   placed RETURN statements which are not at the termination points
>   of their procedures (call these 'early returns') have a flow and
>   execution which is not easy to understand....

I think you have provided sufficient examples of cases in which the
opposite is true.

>   (2) Proof techniques or constructively correct design procedures
>   are not easily applied to these constructs because they have complex
>   semantic definitions....

I refer again to my _reductio ad absurdum_ argument.

>   (3) Reading programs with EXIT statements and early returns can be
>   difficult, especially if long procedure bodies are involved.

Indeed.  It can be almost as difficult as reading procedures with `extra'
booleans that effect the same control flow.

>   Finding all the relevant embedded statements and in the case of EXITs
>   determining which loop they belong to will require visual parsing of the
>   text....

Understanding the code requires a parse and semantic analysis anyway.
To quote a certain old lady, `Where's the beef?'

>   (4) Other reasons for the constructs usually put forward are either
>   convenience or efficiency. Both arguments are at odds with the primary
>   aims of Modula-2, which are clarity, abstraction, and correctness....

See the answer to (1).

>   (5) Most programmers who are taught programming in accordance with
>   modern computer science concepts have thought patterns which never
>   evoke them."

This is good?

>In summary I think that formalism, when correctly applied, is
>wonderful. But formalism works best when applied to simple code.
>Before jumping into a formal analysis of code, I like to try to
>ensure that the code is written as simply and as easy to understand
>as possible. This precondition increases the chances that further
>formal analysis will supply insight.

Or to put it another way: why spend hours proving clumsy code when
you can prove a rewrite in a few minutes?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu