gast@lanai.cs.ucla.edu (David Gast) (04/19/88)
In article <3364@omepd> bobdi@omepd.UUCP (Bob Dietrich) writes: >In article <11047@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes: >Sorry, but the existing ANSI/IEEE and ISO Pascal standards do not allow such >modification. [of the loop control variable]. Yes, I realize that. That was exactly my point. The program name was illegal. One might have thought the name was somewhat self-explanatory. The main comment went on to say: >>{This program is not legal pascal, but the compiler does not detect the >> error. If the scope of the index variable were local to the for loop >> (ala Algol 68), instead of global, then this error would be detected >> at compiler time. } >There is a concept called "threatening" explained under the >for-statement (section 6.8.3.9). Basically, if a variable has been >"threatened", it cannot be used as a for control variable. Dietrich goes on to explain what threatened means. Are there any compilers that detect every possible instance of a threatened variable? Presumably there must be some other verbage to the extent that an index variable cannot be accessed after the loop ends. The following program is also illegal, but again, I suspect that most compilers do not detect the error. The standard Berkeley 4.3 compiler does not. program ILLEGAL (output); var i : integer; procedure p ; begin writeln(i); /* Variable I is undefined at this point */ end; begin for i := 1 to 10 do writeln(i); p; end. Essentially, slightly over stated, one will have to guarantee that a loop control variable is never used outside the scope of the for loop or do extensive run-time checking. Of course, you can put extra checks into the compiler to try and detect these errors. These checks take time and make the compiler bigger. As long as you end up (de facto) not allowing the loop control variable to be used outside the scope of the for loop, why not do the sensible thing in the first place? Why not decide *in the language definition* that the loop control variable is local to the loop? That is the decision Algol 68 made and it works. It is impossible to assign to a loop control variable in Algol 68. And no concept like "threatening" is needed so it is easier to learn. One could have the following syntax: for FOR-CONT-VAR : S-TYPE := LB to UB do STATEMENT The one argument against this change is that such a change would make previously valid pascal programs illegal. But the use threatening also makes previously, valid pascal programs illegal. That is, a loop control variable can be threatened without actually being assigned to. Such a program would not have been illegal under the old standard, but it is, as I understand it, under the new standard. The old standard Pascal had many type insecurities. Perhaps the new standard eliminates all of these; probably it doesn't. As I no longer have to use Pascal, I do not care to investigate myself. If the new Pascal, however, does not have any type insecurities, then it is a far different language. The defining document and the compilers and run time systems are also undoubtedly much bigger as well.
scl@virginia.acc.virginia.edu (ACC) (04/20/88)
In article <11369@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes: >In article <3364@omepd> bobdi@omepd.UUCP (Bob Dietrich) writes: >>In article <11047@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes: > ... >The following program is also illegal, but again, I suspect that >most compilers do not detect the error. The standard Berkeley 4.3 >compiler does not. > > program ILLEGAL (output); > var > i : integer; > procedure p ; > begin > writeln(i); /* Variable I is undefined at this point */ > end; > > begin > for i := 1 to 10 do > writeln(i); > p; > end. > >Essentially, slightly over stated, one will have to guarantee that >a loop control variable is never used outside the scope of the for >loop or do extensive run-time checking. > Unless the standard changes, Pascal compilers have no choice but to do extensive run-time checking. Nothing in the Pascal standard forbids you from using the index variable any way you wish outside the for loop and within the block enclosing the loop. The above example would be perfectly legal if "i" were assigned a legal value just after the loop and before the call to "p". The real problem here is detecting the use of undefined variables. In addition to the for index becoming undefined at the end of a for loop, local variables are undefined upon activation of their enclosing procedure or function. The variables returned by new() are undefined. After a call to put(), such as put(f), the buffer variable f^ is undefined. And so on. It would be very nice if there were more hardware support for runtime checking. Imagine the compiler assigning to an undefined variable a magic cookie that would cause a hardware exception if that variable was ever read. As a matter of fact such a beast exists on some CDC cyber systems. The cyber uses 1's complement arithmetic. As such, there are two representations for zero -- positive zero (all bits clear) and negative zero (all bits set). Negative zero can't be used in calculations. The cyber has (had?) a FORTRAN compiler with an option to initialize all data space to -0. Then any attempt to use an uninitialized variable would halt the program. Of course the classic tradeoff here is speed vs. security. If more of the dirty work could be shoved into hardware (as has been done with virtual memory) then fast and secure compilers would be easier to build. -- Steve Losen scl@virginia.edu University of Virginia Academic Computing Center
bobdi@omepd (Bob Dietrich) (04/21/88)
In article <11369@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes: > ... >Dietrich goes on to explain what threatened means. Are there any >compilers that detect every possible instance of a threatened >variable? Yes, there are several. The rules were defined so that the violation is easily detectable at compile time. In exchange, the rules are stricter than what would be required if violations were checked at runtime (i.e., not all threats to control variables are actually harmful). > >Presumably there must be some other verbage to the extent >that an index variable cannot be accessed after the loop ends. The control variable is undefined at the end of a for-statement, as long as the loop has not been left by a goto-statement. More below. >The following program is also illegal, but again, I suspect that >most compilers do not detect the error. The standard Berkeley 4.3 >compiler does not. There's a lot the Berkley compiler doesn't do. > > [code deleted] >Essentially, slightly over stated, one will have to guarantee that >a loop control variable is never used outside the scope of the for >loop or do extensive run-time checking. > >Of course, you can put extra checks into the compiler to try and detect >these errors. These checks take time and make the compiler bigger. Yes, I suppose you could do data-flow analysis of programs. I know of one Pascal compiler that did: it was indeed very expensive and helped bring about the concept of threatening. If you introduce a separate compilation facility, things get much worse. Unfortunately, you are only really addressing part of the more general problem of use of undefined variables. Pascal has one of the few language specifications around that actually discusses the concept of undefined variables. It specifies when variables become defined (with a value) and when they become undefined. In general, it is an error to use ANY undefined variable in an expression. The example given just happens to use one of the ways a variable may become undefined. If you delete the for-statement altogether, leaving the variable i uninitialized, you may see what I'm getting at. In practice, however, what is definable in an axiomatic manner in the specification is usually not implemented. To do proper detection of undefined variables, you typically need a tagged architecture, or must put a little more effort into you Pascal processor (compiler+runtime or interpreter). Most architectures that are in popular use do not have a mechanism to tag a variable as being undefined (there is no such thing as an "undefined value"!). Lacking such aids, a processor must (optimally) do some flow analysis to try and catch use of undefined variables at translation time, and generate checks for those cases that are not determinable until runtime. This involves at least a bit per variable, which must be passed around wherever the variable is referenced. Not too bad for normal variables, but when you consider that an array is undefined unless all its components are defined, things can escalate quickly. What you end up with is translation time analysis, runtime checks, and some potentially large additional data structures. Most people apparently don't feel the results are worth the expense, since it isn't commonly implemented. Given how many times I've seen myself or others chase down bugs caused by uninitialized or undefined variables, it's a shame. > >As long as you end up (de facto) not allowing the loop control variable >to be used outside the scope of the for loop, why not do the sensible >thing in the first place? Why not decide *in the language definition* >that the loop control variable is local to the loop? That is the >decision Algol 68 made and it works. It is impossible to assign to a >loop control variable in Algol 68. And no concept like "threatening" >is needed so it is easier to learn. > >One could have the following syntax: > > for FOR-CONT-VAR : S-TYPE := LB to UB do STATEMENT This alone does not cure the problem, because what if STATEMENT is an invocation of the read procedure, an assignment to the control variable, or a procedure call that passes the control as a variable parameter? You still need rules similar to the threat concept. Furthermore, you have now introduced a brand new place that variables can be declared. Not necessarily evil, but another concept to specify and explain. > >The one argument against this change is that such a change would make >previously valid pascal programs illegal. But the use threatening also >makes previously, valid pascal programs illegal. That is, a loop >control variable can be threatened without actually being assigned to. >Such a program would not have been illegal under the old standard, but >it is, as I understand it, under the new standard. Just to be clear, there is only one official standard for Pascal right now, embodied in the ANSI/IEEE and ISO standards. Extended Pascal is not yet a standard, as it is still under development, but nearing completion of this go-around. The concept of threatening, however, is in the current standard. The only changes in Extended Pascal (if there are any, I can't remember) are for new language features. > >The old standard Pascal had many type insecurities. Perhaps the new >standard eliminates all of these; probably it doesn't. As I no longer >have to use Pascal, I do not care to investigate myself. If the new >Pascal, however, does not have any type insecurities, then it is a far >different language. The defining document and the compilers and run time >systems are also undoubtedly much bigger as well. If by the "old standard" you mean the Pascal User Manual and Report by Jensen and Wirth, I agree that this was a de facto standard and had many problems. Hence the current standard, which specifies name type compatibility (J&W was foggy on this point). Other than type compatibility, I know of no other "type insecurities", which is an entirely different subject than the one we have been discussing. BTW, the Third Edition of J&W was extensively revised to incorporate decisions made in the standard. As far as "new Pascal" goes, Extended Pascal does not replace the current standard. They will co-exist as long as there is a desire to keep both alive. If you want a simpler language, use the current standard. If you want the features people have been frequently adding to the language, like modularity, string handling, etc., use Extended Pascal. Either way, you pay for what you get (or don't get). Furthermore, Extended Pascal is upward compatible with the current standard, unless you happen to have a variable called "module" or one of the few other new reserved words. Perhaps this is a bit late to say this, but I think I agree with your aims of security and simplicity. I've just been trying to jive reality with what you said, and point out some of the problems of achieving those aims. I like my programs to work; that's why I avoid using C whenever possible. Bob Dietrich Intel Corporation, Hillsboro, Oregon (503) 696-4400 or 2092(messages x4188,2111) usenet: tektronix!ogcvax!omepd!bobdi or tektronix!psu-cs!omepd!bobdi or ihnp4!verdix!omepd!bobdi
barmar@think.COM (Barry Margolin) (04/22/88)
In article <3401@omepd> bobdi@omepd.UUCP (Bob Dietrich) writes: >Just to be clear, there is only one official standard for Pascal right now, >embodied in the ANSI/IEEE and ISO standards. That's TWO official standards, since the ANSI Pascal standard and the ISO standard are not equivalent. I don't even think one is a subset of the other. I think they differ incompatibly in a couple of areas, although I don't know what they are offhand. Barry Margolin Thinking Machines Corp. barmar@think.com uunet!think!barmar
art@maui.Berkeley.EDU (Arthur Goldberg) (04/22/88)
A language developed i at IBM research called Hermes (formerly NIL) addresses these issues with the concept of "typestate". The basic idea is that at COMPILE-TIME the current state of each variable is known at each line in the p code. For example,a pointer may be NIL, pointing to an uninitialized data item, or pointing to an initialized item. The operations allowed on a variable depend on its type state. Only initizlized variables can be found on the right hand side of assignments, or sendt in messages, for example. See Strom et . al . publications from IBM research. Pardon the appearance. Im using an unidentified editor. Arthur Goldberg UCLA computer Science Department art@cs.ucla.edu
bobdi@omepd (Bob Dietrich) (04/26/88)
In article <20085@think.UUCP> barmar@fafnir.think.com.UUCP (Barry Margolin) writes: >In article <3401@omepd> bobdi@omepd.UUCP (Bob Dietrich) writes: >>Just to be clear, there is only one official standard for Pascal right now, >>embodied in the ANSI/IEEE and ISO standards. > >That's TWO official standards, since the ANSI Pascal standard and the >ISO standard are not equivalent. I don't even think one is a subset >of the other. I think they differ incompatibly in a couple of areas, >although I don't know what they are offhand. > >Barry Margolin >Thinking Machines Corp. > >barmar@think.com >uunet!think!barmar That's what I get for trying to simplify. Yes, there are actually two standards. The ANSI/IEEE standard corresponds to level 0 of the ISO standard, with about four wording differences. The wording differences are in areas like how the file parameter is bound in the read (ANSI/IEEE wording attempts to prohibit the file variable in "read(file_array[i], i, j)" from changing). The wording differences have little or no impact on most Pascal users. As it is, the ISO Interpretations Subgroup (which I have participated in) is moving the ISO Standard toward the intent of the ANSI/IEEE wording. For those of you wondering, level 1 of ISO Pascal adds conformant arrays, which is the only other area of difference between the Standards. Bob Dietrich Intel Corporation, Hillsboro, Oregon (503) 696-4400 or 2092(messages x4188,2111) usenet: tektronix!ogcvax!omepd!bobdi or tektronix!psu-cs!omepd!bobdi or ihnp4!verdix!omepd!bobdi