[comp.lang.misc] Threatening Pascal Loops

gast@lanai.cs.ucla.edu (David Gast) (04/19/88)

In article <3364@omepd> bobdi@omepd.UUCP (Bob Dietrich) writes:
>In article <11047@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes:

>Sorry, but the existing ANSI/IEEE and ISO Pascal standards do not allow such
>modification.  [of the loop control variable].

Yes, I realize that.  That was exactly my point.  The program name was
illegal.  One might have thought the name was somewhat self-explanatory.
The main comment went on to say:

>>{This program is not legal pascal, but the compiler does not detect the
>> error.  If the scope of the index variable were local to the for loop
>> (ala Algol 68), instead of global, then this error would be detected
>> at compiler time. } 

>There is a concept called "threatening" explained under the
>for-statement (section 6.8.3.9). Basically, if a variable has been
>"threatened", it cannot be used as a for control variable. 

Dietrich goes on to explain what threatened means.  Are there any
compilers that detect every possible instance of a threatened
variable?

Presumably there must be some other verbage to the extent
that an index variable cannot be accessed after the loop ends.
The following program is also illegal, but again, I suspect that
most compilers do not detect the error.  The standard Berkeley 4.3
compiler does not.

	program ILLEGAL (output);
	var
		i : integer;
	procedure p ;
		begin
			writeln(i);   /* Variable I is undefined at this point */
		end;

	begin
		for i := 1 to 10 do
			writeln(i); 
		p;
	end.

Essentially, slightly over stated, one will have to guarantee that
a loop control variable is never used outside the scope of the for
loop or do extensive run-time checking.

Of course, you can put extra checks into the compiler to try and detect
these errors.  These checks take time and make the compiler bigger.

As long as you end up (de facto) not allowing the loop control variable
to be used outside the scope of the for loop, why not do the sensible
thing in the first place?  Why not decide *in the language definition*
that the loop control variable is local to the loop?  That is the
decision Algol 68 made and it works.  It is impossible to assign to a
loop control variable in Algol 68.  And no concept like "threatening"
is needed so it is easier to learn.

One could have the following syntax:

	for FOR-CONT-VAR : S-TYPE := LB to UB do STATEMENT

The one argument against this change is that such a change would make
previously valid pascal programs illegal.  But the use threatening also
makes previously, valid pascal programs illegal.  That is, a loop
control variable can be threatened without actually being assigned to.
Such a program would not have been illegal under the old standard, but
it is, as I understand it, under the new standard.

The old standard Pascal had many type insecurities.  Perhaps the new
standard eliminates all of these; probably it doesn't.  As I no longer
have to use Pascal, I do not care to investigate myself.  If the new
Pascal, however, does not have any type insecurities, then it is a far
different language.  The defining document and the compilers and run time
systems are also undoubtedly much bigger as well.

scl@virginia.acc.virginia.edu (ACC) (04/20/88)

In article <11369@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes:
>In article <3364@omepd> bobdi@omepd.UUCP (Bob Dietrich) writes:
>>In article <11047@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes:
>
...
>The following program is also illegal, but again, I suspect that
>most compilers do not detect the error.  The standard Berkeley 4.3
>compiler does not.
>
>	program ILLEGAL (output);
>	var
>		i : integer;
>	procedure p ;
>		begin
>			writeln(i);   /* Variable I is undefined at this point */
>		end;
>	
>	begin
>		for i := 1 to 10 do
>			writeln(i); 
>		p;
>	end.
>
>Essentially, slightly over stated, one will have to guarantee that
>a loop control variable is never used outside the scope of the for
>loop or do extensive run-time checking.
>

Unless the standard changes, Pascal compilers have no choice but to do
extensive run-time checking.

Nothing in the Pascal standard forbids you from using the index variable
any way you wish outside the for loop and within the block enclosing the
loop.  The above example would be perfectly legal if "i" were assigned
a legal value just after the loop and before the call to "p".
The real problem here is detecting the use of undefined variables. 
In addition to the for index becoming undefined at the end of
a for loop, local variables are undefined upon activation of their
enclosing procedure or function.  The variables returned by new() are
undefined.  After a call to put(), such as put(f), the buffer variable
f^ is undefined.  And so on.

It would be very nice if there were more hardware support for runtime
checking.  Imagine the compiler assigning to an undefined variable
a magic cookie that would cause a hardware exception if that variable was
ever read.  As a matter of fact such a beast exists on some CDC cyber
systems.  The cyber uses 1's complement arithmetic.  As such, there are
two representations for zero -- positive zero (all bits clear) and
negative zero (all bits set).  Negative zero can't be used in calculations.
The cyber has (had?) a FORTRAN compiler with an option to initialize all
data space to -0.  Then any attempt to use an uninitialized variable would
halt the program.  

Of course the classic tradeoff here is speed vs. security.  If more of
the dirty work could be shoved into hardware (as has been done with
virtual memory) then fast and secure compilers would be easier to build.
-- 
Steve Losen     scl@virginia.edu
University of Virginia Academic Computing Center

bobdi@omepd (Bob Dietrich) (04/21/88)

In article <11369@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes:
> ...
>Dietrich goes on to explain what threatened means.  Are there any
>compilers that detect every possible instance of a threatened
>variable?

Yes, there are several. The rules were defined so that the violation is
easily detectable at compile time. In exchange, the rules are stricter than
what would be required if violations were checked at runtime (i.e., not all
threats to control variables are actually harmful).
>
>Presumably there must be some other verbage to the extent
>that an index variable cannot be accessed after the loop ends.

The control variable is undefined at the end of a for-statement, as long as
the loop has not been left by a goto-statement. More below.

>The following program is also illegal, but again, I suspect that
>most compilers do not detect the error.  The standard Berkeley 4.3
>compiler does not.

There's a lot the Berkley compiler doesn't do.
>
>  [code deleted]
>Essentially, slightly over stated, one will have to guarantee that
>a loop control variable is never used outside the scope of the for
>loop or do extensive run-time checking.
>
>Of course, you can put extra checks into the compiler to try and detect
>these errors.  These checks take time and make the compiler bigger.

Yes, I suppose you could do data-flow analysis of programs. I know of one
Pascal compiler that did: it was indeed very expensive and helped bring
about the concept of threatening. If you introduce a separate compilation
facility, things get much worse.

Unfortunately, you are only really addressing part of the more general
problem of use of undefined variables. Pascal has one of the few language
specifications around that actually discusses the concept of undefined
variables. It specifies when variables become defined (with a value) and
when they become undefined. In general, it is an error to use ANY undefined
variable in an expression. The example given just happens to use one of the
ways a variable may become undefined. If you delete the for-statement
altogether, leaving the variable i uninitialized, you may see what I'm
getting at.

In practice, however, what is definable in an axiomatic manner in the
specification is usually not implemented. To do proper detection of undefined
variables, you typically need a tagged architecture, or must put a little
more effort into you Pascal processor (compiler+runtime or interpreter).

Most architectures that are in popular use do not have a mechanism to tag a
variable as being undefined (there is no such thing as an "undefined
value"!). Lacking such aids, a processor must (optimally) do some flow
analysis to try and catch use of undefined variables at translation time, and
generate checks for those cases that are not determinable until runtime. This
involves at least a bit per variable, which must be passed around wherever
the variable is referenced. Not too bad for normal variables, but when you
consider that an array is undefined unless all its components are defined,
things can escalate quickly.

What you end up with is translation time analysis, runtime checks, and some
potentially large additional data structures. Most people apparently don't
feel the results are worth the expense, since it isn't commonly implemented.
Given how many times I've seen myself or others chase down bugs caused by
uninitialized or undefined variables, it's a shame.
>
>As long as you end up (de facto) not allowing the loop control variable
>to be used outside the scope of the for loop, why not do the sensible
>thing in the first place?  Why not decide *in the language definition*
>that the loop control variable is local to the loop?  That is the
>decision Algol 68 made and it works.  It is impossible to assign to a
>loop control variable in Algol 68.  And no concept like "threatening"
>is needed so it is easier to learn.
>
>One could have the following syntax:
>
>	for FOR-CONT-VAR : S-TYPE := LB to UB do STATEMENT

This alone does not cure the problem, because what if STATEMENT is an
invocation of the read procedure, an assignment to the control variable, or
a procedure call that passes the control as a variable parameter? You still
need rules similar to the threat concept. Furthermore, you have now
introduced a brand new place that variables can be declared. Not necessarily
evil, but another concept to specify and explain.
>
>The one argument against this change is that such a change would make
>previously valid pascal programs illegal.  But the use threatening also
>makes previously, valid pascal programs illegal.  That is, a loop
>control variable can be threatened without actually being assigned to.
>Such a program would not have been illegal under the old standard, but
>it is, as I understand it, under the new standard.

Just to be clear, there is only one official standard for Pascal right now,
embodied in the ANSI/IEEE and ISO standards. Extended Pascal is not yet a
standard, as it is still under development, but nearing completion of this
go-around. The concept of threatening, however, is in the current standard.
The only changes in Extended Pascal (if there are any, I can't remember) are
for new language features.
>
>The old standard Pascal had many type insecurities.  Perhaps the new
>standard eliminates all of these; probably it doesn't.  As I no longer
>have to use Pascal, I do not care to investigate myself.  If the new
>Pascal, however, does not have any type insecurities, then it is a far
>different language.  The defining document and the compilers and run time
>systems are also undoubtedly much bigger as well.

If by the "old standard" you mean the Pascal User Manual and Report by Jensen
and Wirth, I agree that this was a de facto standard and had many problems.
Hence the current standard, which specifies name type compatibility (J&W was
foggy on this point). Other than type compatibility, I know of no other "type
insecurities", which is an entirely different subject than the one we have
been discussing. BTW, the Third Edition of J&W was extensively revised to
incorporate decisions made in the standard.

As far as "new Pascal" goes, Extended Pascal does not replace the current
standard. They will co-exist as long as there is a desire to keep both alive.
If you want a simpler language, use the current standard. If you want the
features people have been frequently adding to the language, like modularity,
string handling, etc., use Extended Pascal. Either way, you pay for what you
get (or don't get). Furthermore, Extended Pascal is upward compatible with
the current standard, unless you happen to have a variable called "module" or
one of the few other new reserved words.

Perhaps this is a bit late to say this, but I think I agree with your aims of
security and simplicity. I've just been trying to jive reality with what you
said, and point out some of the problems of achieving those aims. I like my
programs to work; that's why I avoid using C whenever possible.

				Bob Dietrich
				Intel Corporation, Hillsboro, Oregon
				(503) 696-4400 or 2092(messages x4188,2111)
		usenet:		tektronix!ogcvax!omepd!bobdi
		  or		tektronix!psu-cs!omepd!bobdi
		  or		ihnp4!verdix!omepd!bobdi

barmar@think.COM (Barry Margolin) (04/22/88)

In article <3401@omepd> bobdi@omepd.UUCP (Bob Dietrich) writes:
>Just to be clear, there is only one official standard for Pascal right now,
>embodied in the ANSI/IEEE and ISO standards.

That's TWO official standards, since the ANSI Pascal standard and the
ISO standard are not equivalent.  I don't even think one is a subset
of the other.  I think they differ incompatibly in a couple of areas,
although I don't know what they are offhand.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
uunet!think!barmar

art@maui.Berkeley.EDU (Arthur Goldberg) (04/22/88)

A language developed i  at IBM research called Hermes (formerly NIL)
addresses these issues with the concept of "typestate".   The basic
idea is that at COMPILE-TIME the current state of each variable is
known at each line in the p  code.  For example,a pointer may be NIL,
pointing to an uninitialized data item, or pointing to an initialized item.
The operations allowed on a variable depend on its type state. Only
initizlized variables can be found on the right hand side of assignments,
or sendt in messages, for example.  See Strom et . al . publications
from IBM research.

Pardon the appearance.   Im using an unidentified editor.

Arthur Goldberg
UCLA computer Science Department
art@cs.ucla.edu

bobdi@omepd (Bob Dietrich) (04/26/88)

In article <20085@think.UUCP> barmar@fafnir.think.com.UUCP (Barry Margolin) writes:
>In article <3401@omepd> bobdi@omepd.UUCP (Bob Dietrich) writes:
>>Just to be clear, there is only one official standard for Pascal right now,
>>embodied in the ANSI/IEEE and ISO standards.
>
>That's TWO official standards, since the ANSI Pascal standard and the
>ISO standard are not equivalent.  I don't even think one is a subset
>of the other.  I think they differ incompatibly in a couple of areas,
>although I don't know what they are offhand.
>
>Barry Margolin
>Thinking Machines Corp.
>
>barmar@think.com
>uunet!think!barmar

That's what I get for trying to simplify. Yes, there are actually two
standards. The ANSI/IEEE standard corresponds to level 0 of the ISO standard,
with about four wording differences. The wording differences are in areas
like how the file parameter is bound in the read (ANSI/IEEE wording attempts
to prohibit the file variable in "read(file_array[i], i, j)" from changing).
The wording differences have little or no impact on most Pascal users. As it
is, the ISO Interpretations Subgroup (which I have participated in) is moving
the ISO Standard toward the intent of the ANSI/IEEE wording.

For those of you wondering, level 1 of ISO Pascal adds conformant arrays,
which is the only other area of difference between the Standards.

				Bob Dietrich
				Intel Corporation, Hillsboro, Oregon
				(503) 696-4400 or 2092(messages x4188,2111)
		usenet:		tektronix!ogcvax!omepd!bobdi
		  or		tektronix!psu-cs!omepd!bobdi
		  or		ihnp4!verdix!omepd!bobdi