[net.lang] Proofs &c

chris@umcp-cs.UUCP (Chris Torek) (07/09/86)

In article <12221138454.23.PATTIS@WASHINGTON.ARPA> PATTIS@WASHINGTON.ARPA
(Richard Pattis) writes:
>For introductory programming classes, I try to emphasize the teaching
>of critical thinking.  I attempt teach techniques for trying to
>prove that code is incorrect (such an activity may be labeled as
>aggressive debugging - even in the absence of manifest bugs).

This can be done `on line' as well: it is good practice when testing
code to try to provoke any `boundary conditions' or unusual code
paths, as such bugs tend to show up less often, but more dramatically
(or drastically), simply because the program is `time-tested' and
now `can be relied upon'.

>Mathematically proving and disproving code are identical,

A nit: not really.  Disproving can be done with a single counterexample,
but a lack of counterexamples does not constitute a proof.

>but in training future programmers, I would much prefer that my
>students say, "I haven't found anything wrong with my code yet,"
>to the all too frequently heard, "It's correct."

A proof of correctness does not mean that a piece of code works,
for the proof can be wrong or for some reason inapplicable, so
I think this is indeed a good thing.  It is, however, easy to
fall into the trap of thinking `no one has found anything wrong
with it yet, so it must be right'.  A few embarrassing errors
can do wonders for one's humility:

>Of course, it's easier to spot errors in someone else's code, so
>one good way to teach this skill is to ask students to analyze code
>written by others (known to be wrong) and find any errors.

You might also have students analyse each other's code.

I wonder, though, if it is indeed easier to spot errors in another's
code, or whether there is just more incentive for that than for
finding one's own errors.  On occasion I will look at some of my
own old code and note `obvious' mistakes.  If everyone did this we
might have much better code available, but on the other hand we
would probably have much less code as well, though on the third
hand (first foot?) that might not be so bad after all :-).

>On formalism: I believe that formalism is useful only when it can
>be applied to extend what we can do, or accurately check what we
>have done. Formalism for its own sake is pedantry.  For example,
>I advocate the use of simple control structures such as Pascal's,
>which subscribe to the 1-in/1-out rule.  Yet in Modula-2 we may
>violate this rule by using LOOP with EXIT and subroutines that use
>RETURN.  I use these features to simply implement simple algorithms:
>often such code is easier to understand, because it omits the
>introduction and use of extraneous state variables and other
>circuitous control flow.

Indeed.  What I cannot understand is why anyone thinks that EXIT
and RETURN statements are bad.  It is trivial to come up with an
EXIT axiom, even if it requires the introduction of a pseudo-variable
in the formal proof.  (Simply mechanically insert a boolean in the
loop, and an `if' around the part of the loop that follows the
EXIT.  The code constructs are equivalent, so the formal constructs
are as well.)

>Some comments on these solutions: Search1 uses two RETURN statements,
>one in the FOR loop. Many programmers consider the premature RETURN
>from the FOR loop to be bad programming practice.

Again, I ask:  *Why*?  An argument in favour of RETURN, by _reductio
ad absurdum_:  Integer variables are bad because the actual variable
and the formal description do not match exactly, for computer
integers, unlike formals, can overflow and become negative or cause
a runtime exception.  Because computer integers are hard to describe
formally, using integers is bad programming practice.

[Incidentally, the introductory programming course at the University
of Maryland actually *does* avoid integers.  For their first year,
students use something called `CF Pascal', in which the only legal
data types are characters and files.  The students are supposed to
be able to prove their programs correct, and having few data types
(one) and `Turing-tape-like' stores (files) helps keep the proof
complexity down.  I do not believe that these restrictions are
sensible beyond the stage of learning to write and prove algorithms,
and I am not even certain they are sensible there.  It does provoke
novel solutions to old problems, however.]

>... now remove the ability to perform short-circuit boolean
>evaluation (SCBE) and rewrite this function in Pascal.  Some
>programmers may argue that SCBE should be disallowed for similar
>reasons of formalilty, because it doesn't support the intuitive
>semantics for boolean expressions (commutivity); in the example
>above, we must very carefully select our post-loop IF test.

Why is commutivity `intuitive' here?  Having used SCBE's for years,
I find it very intuitive; indeed, at least one of the last few Pascal
programs I wrote broke because of code like the following:

	if (n <> 0) and ((x mod n) = 0) then ...

I had forgotten that Pascal semantics implied commutivity, and had
taken advantage of a nonexistent feature.  The error was easily
corrected, but this does show that a commutative mindset is by no
means universal.  And again, it is not hard to formally express a
short-circuit evaluation, though in conventional formalism the
proof itself will `suggest' a series of IFs.

> PROCEDURE Search2 (VAR A   : ARRAY OF INTEGER;	      (* Efficiency *)
>  		     Key     : INTEGER;
>		     Default : INTEGER)           : INTEGER;
>  VAR I,AnswerIndex : INTEGER;
>  BEGIN 
>    AnswerIndex:= Default;
>    I:= 0;
>    WHILE (I <= HIGH(A)) AND (AnswerIndex = Default) DO
>      IF A[I] = Key
>        THEN AnswerIndex:= I END;	 (* Found first I: A[I]  = Key      *)
>      I:= I+1
>    END;
>    RETURN AnswerIndex
>  END Search2;

Curious: if the Default answer is a legal subscript for A, and the
desired value appears in A at that point and also at a later point,
this routine returns the later point.  For example, if A contains
the values (4, 7, 2, 7, 3), Key is 7, and Default is 1, this function
returns 3.  Perhaps this is the `subtlety' you mention (I thought
it fairly obvious).

[N.B.: The following points were quoted for the purpose of refutation;
I am *agreeing* with Pattis below.]

>Here again I have abstracted some quotations from Sale's book
>(page 401). ...
>
>  "(1) Program fragments containing EXIT statements or unusually
>   placed RETURN statements which are not at the termination points
>   of their procedures (call these 'early returns') have a flow and
>   execution which is not easy to understand....

I think you have provided sufficient examples of cases in which the
opposite is true.

>   (2) Proof techniques or constructively correct design procedures
>   are not easily applied to these constructs because they have complex
>   semantic definitions....

I refer again to my _reductio ad absurdum_ argument.

>   (3) Reading programs with EXIT statements and early returns can be
>   difficult, especially if long procedure bodies are involved.

Indeed.  It can be almost as difficult as reading procedures with `extra'
booleans that effect the same control flow.

>   Finding all the relevant embedded statements and in the case of EXITs
>   determining which loop they belong to will require visual parsing of the
>   text....

Understanding the code requires a parse and semantic analysis anyway.
To quote a certain old lady, `Where's the beef?'

>   (4) Other reasons for the constructs usually put forward are either
>   convenience or efficiency. Both arguments are at odds with the primary
>   aims of Modula-2, which are clarity, abstraction, and correctness....

See the answer to (1).

>   (5) Most programmers who are taught programming in accordance with
>   modern computer science concepts have thought patterns which never
>   evoke them."

This is good?

>In summary I think that formalism, when correctly applied, is
>wonderful. But formalism works best when applied to simple code.
>Before jumping into a formal analysis of code, I like to try to
>ensure that the code is written as simply and as easy to understand
>as possible. This precondition increases the chances that further
>formal analysis will supply insight.

Or to put it another way: why spend hours proving clumsy code when
you can prove a rewrite in a few minutes?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

smd@umcp-cs.UUCP (Stanley Dunn) (07/09/86)

In article <2308@umcp-cs.UUCP> chris@maryland.UUCP (Chris Torek) writes:
>
>[Incidentally, the introductory programming course at the University
>of Maryland actually *does* avoid integers.  For their first year,
>students use something called `CF Pascal', in which the only legal
>data types are characters and files.  

For the first semester course (CMSC 112) the students use CF Pascal.
In the second semester course (CMSC 113) the students are exposed
tothe full language in a disciplined fashion.

>                                       The students are supposed to
>be able to prove their programs correct, and having few data types
>(one) and `Turing-tape-like' stores (files) helps keep the proof
>complexity down.

They are able to prove their programs correct.
But more importantly, they have learned how to solve 
problems rigorously.

>  I do not believe that these restrictions are
>sensible beyond the stage of learning to write and prove algorithms,
>and I am not even certain they are sensible there.  It does provoke
>novel solutions to old problems, however.]
>

I am sure that many others agree, however this is exactly
the point of view that people have to change.

Many students feel that CF Pascal is a "crutch" and that they should
be allowed to use the full language.  At least the students have an
excuse in that they are young and inexperienced.  The fact
of the matter is that the opposite is true - everything beyond
CF Pascal are merely shorthand notations for cumbersome operations
in CF Pascal, and that if the programmer has mastered the 
concept of Procedural Abstraction there should be no conceptual
difference.

Learning a science is not learning all the details about
the tools you are working with - that sort of training
makes you a technician.  To be a scientist or an engineer
requires that you have a deeper level of understanding 
of the properties of the tools you are using to solve problems.
Computer Science should be no different, and CF Pascal
helps development the thought processes of the 
students (with appropriate exercises, of course).
The projects you use have to be geared for CF Pascal without 
doing any arithmetic or random access manipulation.

Two examples of projects I gave my students to write
were an interpreter and a program for symbolic evaluaton of boolean
expressions.  There are many problems that are hard to solve 
because the concepts are difficult, and not because of CF Pascal.
In fact, many of the projects I used were just character and 
string processing.

This approach to teaching should not be condemmed without
a thorough understanding of the material, after which you 
can give a mature, educated evaluation.  Next week a short
course is being offered here (Dept. of Computer Science, UMCP)
for those interested in teaching Computer Science with 
the Mills' text and this structured approach (CF / D / O Pascal).
Harlan Mills, Vic Basili, John Gannon, and myself will
be going over the text, and relating personal experience
from teaching the two courses CMSC 112  and CMSC 113 here.

chris@umcp-cs.UUCP (Chris Torek) (07/09/86)

>In article <2308@umcp-cs.UUCP> chris@maryland.UUCP (Chris Torek) writes:
>>Incidentally, the introductory programming course at the University
>>of Maryland actually *does* avoid integers.  For their first year,
>>students use something called `CF Pascal', in which the only legal
>>data types are characters and files.  

In article <2313@umcp-cs.UUCP> smd@maryland.UUCP (Stanley Dunn) writes:
>For the first semester course (CMSC 112) the students use CF Pascal.
>In the second semester course (CMSC 113) the students are exposed
>to the full language in a disciplined fashion.

All right, one semester; that sounds a bit better to me, although I
do not know just what length of time is `best'.  Less than a semester
is likely to be too short; more than a year is likely to be too long.

>>                                       The students are supposed to
>>be able to prove their programs correct, and having few data types
>>(one) and `Turing-tape-like' stores (files) helps keep the proof
>>complexity down.

>They are able to prove their programs correct.

You mean those who make it through the course are able.  There will
always be those who do not make it---thus the words `supposed to be'.

>But more importantly, they have learned how to solve 
>problems rigorously.

This is true, but was not relevant to the original context to which
the quoted text was `incidental' (see the first quoted word!).

>>  I do not believe that these restrictions are
>>sensible beyond the stage of learning to write and prove algorithms,
>>and I am not even certain they are sensible there.  It does provoke
>>novel solutions to old problems, however.

>I am sure that many others agree, however this is exactly
>the point of view that people have to change.

Many others agree with what?  (Hold on to that thought for a moment.)

>Many students feel that CF Pascal is a "crutch" and that they should
>be allowed to use the full language.  At least the students have an
>excuse in that they are young and inexperienced.  The fact
>of the matter is that the opposite is true - everything beyond
>CF Pascal are merely shorthand notations for cumbersome operations
>in CF Pascal, and that if the programmer has mastered the 
>concept of Procedural Abstraction there should be no conceptual
>difference.

Back to the first paragraph: agree with what?  That the restrictions
are not sensible once a programmer understands algorithms?  Are
you arguing that they are?  I do not believe you are.  Rather,
you seem to think that I said that the CF Pascal restrictions are not
sensible in an introductory programming course.  I DID NOT SAY THAT.
I said that I am not positive that they are sensible.  I *think*
they are, but I cannot be certain: especially not certain, given
my limited experience with the results.

I *am* certain that `shorthand ... for cumbersome operations' *is*
necessary in order to get things done.  It is conceptually equivalent
for me to type in strings of ones and zeros now rather than ASCII
characters, but I would never get anything done that way.  I could
write a Turing machine simulator and write all my code in that: it
is conceptually equivalent to (say) C.  I would never get any useful
work done though.  A proper foundation is essential, but I am not
going to live in an unroofed basement.

>Learning a science is not learning all the details about
>the tools you are working with - that sort of training
>makes you a technician.  To be a scientist or an engineer
>requires that you have a deeper level of understanding 
>of the properties of the tools you are using to solve problems.

Agreed.

>Computer Science should be no different, and CF Pascal
>helps development the thought processes of the 
>students (with appropriate exercises, of course).

You seem very confident---but then you have taught the course and
seen the results.  All I have seen is `from the outside', where it
looks good, but I am not willing to make a final decision.

>This approach to teaching should not be condemmed without
>a thorough understanding of the material, after which you 
>can give a mature, educated evaluation.

I was not, I *am* not, condemning it! The original context to which
my remarks were `incidental' concerned program proofs; I wanted to
mention CF Pascal because it is such a provable language.  It may
also be a good language for teaching computer science; I *think*
so, but I will not say `this is incontrovertible fact'.  I repeat:
I am not *certain* that the restrictions in CF Pascal are sensible
in a first programming course.  If you are, fine; but you have more
information than I.

(Sorry to go on for so long, but I hate being misinterpreted.)

>Next week a short course is being offered here (Dept. of Computer
>Science, UMCP) for those interested in teaching Computer Science with 
>the Mills' text and this structured approach (CF / D / O Pascal).

How does one find out about these things?  I have participated in
a few short courses on Unix and C; but I heard of them only by
being asked to help teach!
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu