[net.lang] Smart compilers

rcd@opus.UUCP (Dick Dunn) (12/14/84)

> I don't understand why people would consider a compiler "smart"
> if it were to replace the two nested empty 'for' loops with assignments
> of the loop variables.  I prefer a compiler to do the expected -- what
> I told it to do.  These kinds of 'optimizations' can make programming a pain
> in the neck...

I agree with this sentiment--but realize that it's as much wrapped up in
language design as in compiler implementation.  Languages such as C (and a
VERY few others) allow you sufficient capability to say just what you want
done that the attitude above makes sense.  On the other side, consider a
FORTRAN compiler for a vector machine.   FORTRAN is so crippled in terms of
being able to express what's "really going on" in its problem domain that
a decent (let alone smart) compiler has to do a lot of optimization.
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...Are you making this up as you go along?

rcd@opus.UUCP (Dick Dunn) (12/19/84)

>  Dick Dunn <rcd@opus > writes:
>  > FORTRAN is so crippled in terms of
>  > being able to express what's "really going on" in its problem domain that
>  > a decent (let alone smart) compiler has to do a lot of optimization.
> 
>  Compilers that optimize code can break programs regardless of what 
>  language that code is written in.  Optimizing compilers are very useful
>  as long as the optimization is optional (or at least can be turned
>  off when the code involved may be sensitive to optimization).

Not quite fair to compilers.  If optimizing breaks programs which conform
to the language definition, the optimizer is broken.  If optimizing breaks
a program because the program is non-conforming (as, say, a program which
depends on order of evaluation in most languages), the program is at fault.
Using C assignments to manipulate or busy-wait on device registers is a
gray area, which is why (1) optimizers zing people on such code
periodically and (2) a mechanism like `volatile' has been proposed.

FORTRAN is less susceptible to clashes with the optimizer due to the
device-register sort of hassle but more susceptible to poor programming
such as parameter/COMMON aliasing, misunderstanding of modifying
initialized COMMON, etc.

(And yes, I know--the program is equally broken whether it's the
compiler's fault or the programmer's.)

As to turning off optimization--sometimes you want certain optimizations on
and others off, though this can be annoying for compiler-writers.
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...Are you making this up as you go along?

ndiamond@watdaisy.UUCP (Norman Diamond) (12/20/84)

> >  Optimizing compilers are very useful
> >  as long as the optimization is optional (or at least can be turned
> >  off when the code involved may be sensitive to optimization).
> 
> Not quite fair to compilers.  If optimizing breaks programs which conform
> to the language definition, the optimizer is broken.  If optimizing breaks
> a program because the program is non-conforming (as, say, a program which
> depends on order of evaluation in most languages), the program is at fault.
> Using C assignments to manipulate or busy-wait on device registers is a
> gray area, which is why (1) optimizers zing people on such code
> periodically and (2) a mechanism like `volatile' has been proposed.
> 
> FORTRAN is less susceptible to clashes with the optimizer due to the
> device-register sort of hassle but more susceptible to poor programming
> such as parameter/COMMON aliasing, misunderstanding of modifying
> initialized COMMON, etc.

It is usually necessary for optimizations to break certain obscure points
that conform to the language definition, e.g. which order STATEMENTS are
executed in so that code can be moved out of loops.  Many useful optimizations
cannot be done when more than 99.9% of the language definition must be
respected.

In turn, this is the reason that optimizations must be optional.

And if these optimizations were not available, clamouring might start for a
return to assembly languages.

-- Norman Diamond

UUCP:  {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond
CSNET: ndiamond%watdaisy@waterloo.csnet
ARPA:  ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa

"Opinions are those of the keyboard, and do not reflect on me or higher-ups."

oacb2@ut-ngp.UUCP (oacb2) (12/21/84)

> FORTRAN is less susceptible to clashes with the optimizer due to the
> device-register sort of hassle but more susceptible to poor programming
> such as parameter/COMMON aliasing, misunderstanding of modifying
> initialized COMMON, etc.

However, the IBM H Level FORTRAN Programmer's Guide does warn that code such
as

	DO 100 I = 1 TO 10
	IF (Y .GT. 0) X(I) = SQRT(Y)
    100 CONTINUE

(with X appropriately declared as an array of course) can cause an error
because it tries to take the square root of a negative number.  The optimizer
compiles this as if it were

	TEMP = SQRT(Y)
	DO 100 I = 1 TO 10
	IF (Y .GT. 0) X(I) = TEMP
    100 CONTINUE
-- 

	Mike Rubenstein, OACB, UT Medical Branch, Galveston TX 77550

jlg@lanl.ARPA (12/21/84)

> However, the IBM H Level FORTRAN Programmer's Guide does warn that code such
> as
> 
> 	DO 100 I = 1 TO 10
> 	IF (Y .GT. 0) X(I) = SQRT(Y)
>     100 CONTINUE
> 
> (with X appropriately declared as an array of course) can cause an error
> because it tries to take the square root of a negative number.  The optimizer
> compiles this as if it were
> 
> 	TEMP = SQRT(Y)
> 	DO 100 I = 1 TO 10
> 	IF (Y .GT. 0) X(I) = TEMP
>     100 CONTINUE

This particular example should be optimized as

      IF ( Y.GT.0 ) TEMP=SQRT(Y)
      DO 100 I = 1,10         !THERE ARE NO 'TO' DELIMITERS IN FORTRAN
         X(I)=TEMP            !FOLLOW INDENTATION STANDARDS PLEASE
100   CONTINUE

A good data flow analysis would have seen this.  However, in general these
types of optimizations are called 'unsafe'; for obvious reasons.  I don't
think unsafe optimizations are allowed by the standard since they change
the meaning of the code (in particular, they cause errors when none would
have normally occurred).  IBM has simply stepped out of line here.  IBM is
not alone, most vendors tend to have some violations of the standard lurking
around in the compilers.

The given example has a safe optimization available to it.  It is possible
to come up with cases that don't, so in general this sort of optimization
should be considered unsafe (unless you have a really good data flow
analyzer available to guarantee safety).

------------------------------------------------------------------------------
The greatest derangement of the mind is to believe in something
because one wishes it to be so - Louis Pasteur

                                              James Giles

cliff@unmvax.UUCP (12/21/84)

> > However, the IBM H Level FORTRAN Programmer's Guide does warn that code such
> > as
> > 
> > 	DO 100 I = 1 TO 10
> > 	IF (Y .GT. 0) X(I) = SQRT(Y)
> >     100 CONTINUE
> > 
> > (with X appropriately declared as an array of course) can cause an error
> > because it tries to take the square root of a negative number.  The optimizer
> > compiles this as if it were
> > 
> > 	TEMP = SQRT(Y)
> > 	DO 100 I = 1 TO 10
> > 	IF (Y .GT. 0) X(I) = TEMP
> >     100 CONTINUE
> 
> This particular example should be optimized as
> 
>       IF ( Y.GT.0 ) TEMP=SQRT(Y)
>       DO 100 I = 1,10         !THERE ARE NO 'TO' DELIMITERS IN FORTRAN
>          X(I)=TEMP            !FOLLOW INDENTATION STANDARDS PLEASE
> 100   CONTINUE

C Not quite.  You will wind up setting X(I) to zero or the last value of TEMP
C when Y is less than or equal to zero.  How about something that would really
C be correct, like:
C
	IF (Y.LE.0) GO TO 100
	TEMP = SQRT(Y)
	DO 100 I = 1, 10
	    X(I) = TEMP
100	CONTINUE
C
C The greatest derangement of the mind is to believe in something
C because one wishes to be Louis Pasteur.
C
C 	--Cliff [Matthews]
C	{purdue, cmcl2, ihnp4}!lanl!unmvax!cliff
C	{csu-cs, pur-ee, convex, gatech, ucbvax}!unmvax!cliff
C	4744 Trumbull S.E. - Albuquerque  NM  87108 - (505) 265-9143

mangoe@umcp-cs.UUCP (Charley Wingate) (12/22/84)

In article <18397@lanl.ARPA> jlg@lanl.ARPA writes:

>> However, the IBM H Level FORTRAN Programmer's Guide does warn that code
>> such as
>> 
>> 	DO 100 I = 1 TO 10
>> 	IF (Y .GT. 0) X(I) = SQRT(Y)
>>     100 CONTINUE
>> 
>> (with X appropriately declared as an array of course) can cause an error
>> because it tries to take the square root of a negative number.  The
>> optimizer compiles this as if it were
>> 
>> 	TEMP = SQRT(Y)
>> 	DO 100 I = 1 TO 10
>> 	IF (Y .GT. 0) X(I) = TEMP
>>     100 CONTINUE

>This particular example should be optimized as
>
>      IF ( Y.GT.0 ) TEMP=SQRT(Y)
>      DO 100 I = 1,10         !THERE ARE NO 'TO' DELIMITERS IN FORTRAN
>         X(I)=TEMP            !FOLLOW INDENTATION STANDARDS PLEASE
>100   CONTINUE

Wrong again.  A correct optimization would be

      If (Y.GT.0) Temp=Sqrt(Y)
      Do 100 I=1,10
         If (Y.GT.0) X(I)=Temp
  100 Continue

If Y is negative, X should be unchanged, according to the original program.
My feeling is that an optimizer should produce the same "action" on some
well-defined level as the original unoptimized code.  Well-defined in this
case should be equivalent to well-documented.

Charley Wingate   umcp-cs!mangoe

jlg@lanl.ARPA (12/22/84)

> Wrong again.  A correct optimization would be
> 
>       If (Y.GT.0) Temp=Sqrt(Y)
>       Do 100 I=1,10
>          If (Y.GT.0) X(I)=Temp
>   100 Continue

Quite right!  My first version was wrong.  But 'Y' is a loop invariant, the
correct optimization is:

      IF (Y.GT.0) THEN
         TEMP=SQRT(Y)
         DO 100 I=1,10
            X(I)=TEMP
100      CONTINUE
      ENDIF

This has the advantage of not going through the loop at all if the
assignments would all have been skipped.  All these examples show the point
I originally made: these optimizations are unsafe unless you have a REALLY
GOOD data flow analyzer.

------------------------------------------------------------------------------
The greatest derangement of the mind is to believe in something
because one wishes it to be so - Louis Pasteur

                                              James Giles

chris@umcp-cs.UUCP (Chris Torek) (12/23/84)

> It is usually necessary for optimizations to break certain obscure points
> that conform to the language definition, e.g. which order STATEMENTS are
> executed in so that code can be moved out of loops.

Depends on what you mean.  If the loop is guaranteed to be executed at
least once, then moving loop-invariant expressions out of loops is
fine.  If the loop might not get executed, then you can always optimize

	for i := j to k { ... invariant-expr ... }

into

	if j <= k then {
		temp := invariant-expr;
		i := j; do { ... temp ... } while ++i <= k;
	}

(please excuse the syntax).  Note that the "do" is probably slightly
more efficient than the "for", and should probably be used to avoid
degrading performance when j always <= k.

Of course, if you allow such things as interrupts which modify variables
into your language, then the optimizer must be either MUCH smarter (has
to read and understand all your interrupt handlers [.5 :-)]) or you need
hints like ``volatile''.
-- 
(This line accidently left nonblank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

oacb2@ut-ngp.UUCP (oacb2) (12/23/84)

> It is usually necessary for optimizations to break certain obscure points
> that conform to the language definition, e.g. which order STATEMENTS are
> executed in so that code can be moved out of loops.  Many useful
>  optimizations cannot be done when more than 99.9% of the language
> definition must be respected.

> In turn, this is the reason that optimizations must be optional.

> And if these optimizations were not available, clamouring might start for a
> return to assembly languages.

NO! NO! NO!.  An optimizer that changes the meaning of the code is broken
I agree that there is some discussion needed at to just what code should
mean, particularly in languages like C which are often used as fancy
assemblers, but at a very minimum an optimizer should never affect the
results of a program provided

	- no variables are changed by asynchronous processes unknown to the
	  compiler.

	- no variables whose values are undefined by the language specification
	  (e.g. the DO loop index in FORTRAN after normal completion of the
	  loop).

	- timing is not considered a result of the program.

If an optimizer cannot be written that does not change the meanings of
programs with these minimal requirements we are better off without the
optimization.  I'm more than a little tired of the troubles I've had
debugging legal programs because some turkey wrote an optimizer that
shaves a few microseconds off, but screws up in some rare case that I'm
just lucky enought to hit.

By the way, when I first ran into the problem I posted with DO loops in
FORTRAN, I considered it to be a bug.  Now it's been documented (for at
least 10 years), but I still regard it as a bug.  It's simply too outrageous
for a statement like

	IF (Y .GT. 0) X(I) = SQRT(Y)

to ever try to take the square root of a negative for me to accept it.

Unfortunately, too many (read: every one I've ever dealt with) computer
manufacturers thing that it's legitimate for their software to do all
kinds of outrageous things if they just bury a sentence about it in the
documentation.  Another goody from IBM, this time from the PL/I Programmer's
Guide:

	The use of bit strings in a multiprocessing environment may,
	occasionally, lead to incorrect results.
-- 

	Mike Rubenstein, OACB, UT Medical Branch, Galveston TX 77550

rcd@opus.UUCP (Dick Dunn) (12/25/84)

>...It is usually necessary for optimizations to break certain obscure points
> that conform to the language definition, e.g. which order STATEMENTS are
> executed in so that code can be moved out of loops.  Many useful optimizations
> cannot be done when more than 99.9% of the language definition must be
> respected.

Or, rephrasing the above ">", sometimes you get:

	The optimization was a success, but the program died.

I can't see this is anything other than a conflict resulting from either a
bad (or simply incomplete) language definition or a broken compiler.  But
then I've always had trouble dealing with compromises between efficiency
and correctness.
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...Nothing left to do but smile, smile, smile.

herbie@watdcsu.UUCP (Herb Chong, Computing Services) (12/25/84)

> Another goody from IBM, this time from the PL/I Programmer's
> Guide:
> 
> 	The use of bit strings in a multiprocessing environment may,
> 	occasionally, lead to incorrect results.

Taken out of context, this statement is indeed very silly.  You have to
read the paragraphs before and after this to understand that the PL/I
program can be easily communicating with another process sharing the
same address space and possibly written in another language.  No matter
what the compiler does, the silliness of the other program can wipe out
the best code generation.

The other thing is that the 370 architecture does not allow bit
addressing.  If you want to examine or modify a bit, you have to fetch
at least 1 byte and store at least one byte.  That byte you are
interested in may easily contain other bits that are being used by
other processes.  The PL/I language requires that if you are modifying
a bit string shared between two processes (and therefore in the same
address space), you must provide your own synchronization although it
doesn't really come out and say so in one place.

I might also mention (although this is nitpicking) that if you were to
refering to the PL/I Checkout or Optimizing compilers, the manual is the
PL/I Language Reference where this is explained in detail.

Herb Chong...

I'm user-friendly -- I don't byte, I nybble....

gnu@sun.uucp (John Gilmore) (12/26/84)

> James Giles says:
>               My first version was wrong.  But 'Y' is a loop invariant, the
> correct optimization is:
> 
>       IF (Y.GT.0) THEN
>          TEMP=SQRT(Y)
>          DO 100 I=1,10
>             X(I)=TEMP
> 100      CONTINUE
>       ENDIF
> 
> This has the advantage of not going through the loop at all if the
> assignments would all have been skipped.  All these examples show the point
> I originally made: these optimizations are unsafe unless you have a REALLY
> GOOD data flow analyzer.

You're absolutely right!  Had you had a good data flow analyzer, it
would have detected that your optimization had changed the value of 'I'
on exit from this piece of code.

msb@lsuc.UUCP (Mark Brader) (12/26/84)

Ho hum, all of you are wrong.  The CORRECT optimization for:

	DO 100 I = 1 TO 10
	IF (Y .GT. 0) X(I) = SQRT(Y)
	100 CONTINUE

Is, in Ratfor syntax:

	IF (Y > 0) {
		TEMP = SQRT(Y)
		DO I = 1,10
			X(I) = TEMP
	}
	ELSE
		I = 11

(In some implementations the number 11 will be something else, most
likely 10, but leaving I unchanged is clearly wrong!)

Moral:  This is not only trickier than it looks, it's trickier than that!

Mark Brader

jeff@gatech.UUCP (Jeff Lee) (12/28/84)

> It is usually necessary for optimizations to break certain obscure points
> that conform to the language definition, e.g. which order STATEMENTS are
> executed in so that code can be moved out of loops.  Many useful optimizations
> cannot be done when more than 99.9% of the language definition must be
> respected.
> 
> In turn, this is the reason that optimizations must be optional.

This is a truly incredible statement. I'm sure we can come up with many more
optimizations that work real well if we only respect 99% of the language
definition, but how useful are those "optimizations". It seems to me that
when your optimizer gets finished with your code, it is no longer compiling
FORTRAN, Pascal, C, etc... but some dialect that doesn't do what you intended
for it to do. I don't want any compiler (and optimizer) that doesn't "respect"
all of the language definition. That is simply begging and pleading for
trouble.

Jeff Lee
CSNet:	Jeff @ GATech		ARPA:	Jeff.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!jeff
-- 
Jeff Lee
CSNet:	Jeff @ GATech		ARPA:	Jeff.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!jeff

ndiamond@watdaisy.UUCP (Norman Diamond) (12/31/84)

> Ho hum...  The CORRECT [sic] optimization for:
> 
> 	DO 100 I = 1 TO 10
> 	IF (Y .GT. 0) X(I) = SQRT(Y)
> 	100 CONTINUE
> 
> Is, in Ratfor syntax:
> 
> 	IF (Y > 0) {
> 		TEMP = SQRT(Y)
> 		DO I = 1,10
> 			X(I) = TEMP
> 	}
> 	ELSE
> 		I = 11
> 
> (In some implementations the number 11 will be something else, most
> likely 10, but leaving I unchanged is clearly wrong!)

The else case can leave anything at all in I and still be correct.
The value of I is undefined after completion of the DO loop, and when
optimized it remains undefined after the IF statement (whether the
then-branch or ELSE-branch is taken).  The most optimal setting of
I is to leave it alone, requiring 0 machine instructions.

Otherwise, this suggestion was correct, as were several others.

-- Norman Diamond

UUCP:  {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond
CSNET: ndiamond%watdaisy@waterloo.csnet
ARPA:  ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa

"Opinions are those of the keyboard, and do not reflect on me or higher-ups."

bsa@ncoast.UUCP (Brandon Allbery (the tame hacker on the North Coast)) (01/02/85)

It occurs to me that there are two ways to look at optimizers:

	1) That an optimizer optimizes, not the language code, but ASSEMBLER
	   code, and hence does not need to follow the language standard;

	2) That an optimizer must follow the language standard exactly,
	   and hence should optimize the language code itself.

If you accept (1), as every optimizer I've seen is designed to do, you
are likely to see an optimizer that will occasionally fail.  If (2),
there is little optimization possible, especially in a language like
Pascal where you can't manipulate the machine directly.

All you who complain about optimizers breaking code:  look at the current
discussion of the correct optimization of a piece of ForTran code.  No one
has yet proposed an optimization (type 2) that does the same thing as the
original code.  And if it's hard in ForTran, how much harder in Assembler?

Let's face it, guys -- placing any limitations (i.e. standards) on a
programming language is going to make an executing program in that language
slower.  Optimizing is often simply ignoring some of the standards and
removing their effects from the generated code.

--bsa
-- 
  Brandon Allbery @ decvax!cwruecmp!ncoast!bsa (..ncoast!tdi1!bsa business)
	6504 Chestnut Road, Independence, Ohio 44131   (216) 524-1416
    Who said you had to be (a) a poor programmer or (b) a security hazard
			     to be a hacker?

andrew@orca.UUCP (Andrew Klossner) (01/04/85)

} It occurs to me that there are two ways to look at optimizers:
}
}	1) That an optimizer optimizes, not the language code, but ASSEMBLER
}	   code, and hence does not need to follow the language standard;
}
}	2) That an optimizer must follow the language standard exactly,
}	   and hence should optimize the language code itself.
}
} If you accept (1), as every optimizer I've seen is designed to do, you
} are likely to see an optimizer that will occasionally fail.  If (2),
} there is little optimization possible, especially in a language like
} Pascal where you can't manipulate the machine directly.

This shows a lack of familiarity with the computer science behind
optimization.

Systems which improve the assembler output from a compiler are commonly
called "peephole" optimizers, because they never look at more than a
very small part of the program (they look at the program through a
"peephole").  Unix is the only system I've encountered in which the
standard language optimizer is just an assembly code improver.

The more traditional (read "mainframe") optimization strategy is to
convert the entire program into an intermediate form which preserves
the meaning of the code and yet is amenable to rearrangement into a
more efficient program with equivalent meaning.  Once this is done, the
intermediate code is translated into assembler or machine code.
So-called "global" optimizers examine the entire program, using
"data-flow" algorithms which gather information by making several passes
over the intermediate code; this information is then used to make
decisions about how the program can be improved without changing its
meaning.

Pascal programs are particularly amenable to this sort of optimization.
The standard example is the assignment "a[i]:=a[i]+1".  This cannot be
expressed any more efficiently in Pascal, but in a good intermediate
code the translation can be arranged to reflect the fact that
computation of the address of a[i] need only be done once, and that an
"increment word in memory" instruction can be used.

} Let's face it, guys -- placing any limitations (i.e. standards) on a
} programming language is going to make an executing program in that language
} slower.  Optimizing is often simply ignoring some of the standards and
} removing their effects from the generated code.

Not at all.  An implementor who claims that she had to revise the
language in order to write an optimizer is being inexcusably lazy.

Despite the recent discussions about well-known optimizers breaking
programs, such behavior is considered incorrect by workers in the
field.  A good and readable treatment of optimization is Aho and
Ullman's "Principles of Compiler Design", which has become the standard
introductory compiler text.  The chapters on optimization emphasize
that an optimization which changes the semantics of the program is not
valid.

  -- Andrew Klossner   (decvax!tektronix!orca!andrew)       [UUCP]
                       (orca!andrew.tektronix@csnet-relay)  [ARPA]

cem@intelca.UUCP (Chuck McManis) (01/04/85)

Again on the subject of "smart" compilers, specifically FORTRAN-IV on a DEC
2060XE running TOPS-20. The following code segment is approximate but the 
situation was the same. 

	CALL FOO(X,0.,Z)
	...
	X2 = 0.
	WRITE(5,10) X2
10	FORMAT(1X,F10.2)
	END
	SUBROUTINE FOO(X,Y,Z)
	...
	Y=1.0
	RETURN
	END
Where the compiler optimized all constant references to 0.0 to a memory
location containing 0.0 and the subroutine changed that value to 1.0. All
further references to 0 produced the value 1. Not to swift. (I know it is 
poor style, and have since stopped using constants in subroutine calls.)

--Chuck
 

-- 
                                            - - - D I S C L A I M E R - - - 
{ihnp4,fortune}!dual\                     All opinions expressed herein are my
        {qantel,idi}-> !intelca!cem       own and not those of my employer, my
 {ucbvax,hao}!hplabs/                     friends, or my avocado plant. :-}

alexc@dartvax.UUCP (Alex Colvin) (01/06/85)

(intelca.472) mentions what can only be called a horrendous  com-
piler  bug.   That one has been known since roughly FORTRAN II --
namely passing constants by reference.  The solution  adopted  by
most  such  languages  (FORTRAN,PL/I) is to always pass copies of
the constant.

Please understand that the aforementioned compiler is NOT  optim-
izing.  It is compiling incorrect code.

The DTSS PL/I compiler does  the  correct  optimization  in  this
case.  Iff the interprocedure analysis detects that a (reference)
parameter is not modified, then it passes constant  arguments  by
reference,  without  forcing  a copy.  This is done ONLY if it is
known to be safe.

As to passing constants, I see nothing wrong with it.   I  prefer
it  to  passing  variables.  At least with a constant you can see
what you're getting.  True, magic constants (e.g. 501) should  be
given  some sort of explanatory name (e.g. StreetAddress).  But I
see little point in defining ZERO as a name for 0.  Pascal  text-
books  are  particularly  prone to this sort of foolishness.  One
defined NINE as 9. If anything they should have defined it as Ra-
dixLessOne.   That  would  cause  less  surprise  when some idiot
changes it to 7.

robert@cheviot.UUCP (Robert Stroud) (01/08/85)

<This line is a figment of your imagination>

There has been a lot of discussion recently about how to optimise
the Fortran program,

      DO 100 I=1,10
         IF (Y .GT. 0) X(I) = SQRT(Y)
  100    CONTINUE

Most of the suggestions have been wrong, and even those that are right
would not work in the presence of certain pathological cases of aliassing
and side-effects. For example, suppose SQRT was a user function which
modified the COMMON block variable I as a side-effect. Or suppose, SQRT
gave a negative result and Y was EQUIVALENCE'd onto X(3). Could even
a really good data flow analyser cope with such pathologies and would it
be reasonable to expect it to be able to cope?? If this was a SUBROUTINE
fragment and X and Y were parameter or COMMON block elements, then whether
the optimisation was valid or not might well depend on precisely how the
subroutine was called; in other words, it would work sometimes but not 
necessarily always.

I agree completely with the principle that optimisers should not modify
the semantics of programs, but there is a grey area where things get
very tricky. I believe that there are a lot of little known rules for
programming in FORTRAN which are part of the language definition and
which attempt to prevent these pathological cases arising. They only
exist in order to guarantee that certain optimisations will always
be possible. I am thinking of a paper called "Serious Fortran", (I'm
afraid I can't give a more precise reference), and things like the
restrictions on modifications to loop variables in extended DO loop 
ranges and the apparently little known fact that the DO loop variable
is UNDEFINED after executing a loop.

[This may be historical - can any Fortran expert shed more light??]

As an aside, in Pascal you are not allowed to modify a for loop variable
or alter the value of a with expression within the bounds of the statement.
But does anyone know of a Pascal compiler that actually checks for this
and prohibits even such blatant violations as...

       WITH P^ DO
       BEGIN
          P^ := P^.next;
       END;

In practice these rules are not enforced (or indeed cannot be enforced)
and so programs which are strictly speaking illegal will compile without
error but will give unexpected results when optimised. An obvious solution
to the problem is to try and introduce syntactic restrictions which make
aliassing and side-effects either impossible or easily detectable. This
was the approach taken by the designers of the language Euclid, but what 
started as Pascal got extremely complicated and I think it is fair to say that
this approach is a lot harder than it looks. Aliases and side-effects will
be with us as long as we use languages with variables and assignment.

But again, even if in theory all these nasty things are lurking beneath
the surface ready to bite us, in practice do we really run into such
pathological cases. If things were really that bad, we wouldn't be able
to write any programs that worked (:-)!

Does anyone have any good horror stories about being bitten by an alias
or a side-effect? [Preferably in languages with proper abstraction - ie
BASIC is excluded!!]

Robert Stroud,
Computing Laboratory,
University of Newcastle upon Tyne

jhf@lanl.ARPA (01/10/85)

> > I thought Fortran-77 required constant actual parameters to be
> > "copied in" rather than passed by reference?
> 
> FORTRAN parameters are always pass-by-reference.
> -- 
> D Gary Grady
> Duke University Computation Center, Durham, NC  27706

Absolutely not.  Both reference and value-result are acceptable parameter
passing methods for Fortran.  Programs that are sensitive to the difference,
i.e., that create aliases to actual parameters ("dummy arguments") which
they redefine, are not standard conforming.  This restriction, in my opinion,
is entirely appropriate.  Efficiency of parameter passing is of the greatest
importance to the overall quality of a compiler's target code, and archi-
tectural features and other circumstances may make one of these methods
much more desirable than the other.  The compiler writer should be free to
choose the appropriate technique, and not be restricted to one of them
for the sake of guaranteeing the effects of some bad programs.

I might note that Ada imposes a similar restriction on parameters, but,
incredibly, the Pascal standard does not.  It requires call-by-reference
for var parameters!  (At least, it did the last time I looked.  If the
situation has changed, I stand duly corrected and chastised.)  I think
this is a huge mistake.  Not only does it unreasonably restrict
implementations, but it makes impossible the removal of a wart in the
language, that components of packed arrays or records may not be passed
as var parameters.  This restriction is there, of course, because there
will generally not be a reasonable way to implement call-by-reference for
objects that are not aligned on an addressable bit boundary.  This
directly conflicts with the notion that packed types differ from their
unpacked counterparts only in their storage requirements and access times.
What the Pascal standard should have done is to restrict the aliasing
of var parameters in a manner similar to Fortran or Ada, and then to
remove the restriction on passing components of packed structures.
(Most compilers would implement this by copy-in-copy-out on the calling
side, where necessary.)

Joe Fasel
Los Alamos National Laboratory
jhf@lanl.{arpa,uucp}

johnl@ima.UUCP (01/14/85)

>For example, suppose SQRT was a user function which
>modified the COMMON block variable I as a side-effect. Or suppose, SQRT
>gave a negative result and Y was EQUIVALENCE'd onto X(3).

Ah, but it's not.  The Fortran 77 committee gave a lot of thought to making
sure that useful optimizations would be possible.  One of the things they did
was to define a list of "intrinsic" functions of which SQRT was one, and to say
that they are well-behaved.  If for some masochistic reason you want to define
your own version of SQRT, you have to declare it in a special EXTERNAL
statement, and the compiler is duly warned to be careful.

John Levine, ima!johnl

PS:  So how does this all work on a Perq?

joe@petsd.UUCP (Joe Orost) (01/15/85)

In article <206@cheviot.UUCP> robert@cheviot.UUCP (Robert Stroud) writes:
>There has been a lot of discussion recently about how to optimise
>the Fortran program,
>
>      DO 100 I=1,10
>         IF (Y .GT. 0) X(I) = SQRT(Y)
>  100    CONTINUE
>
>Most of the suggestions have been wrong, and even those that are right
>would not work in the presence of certain pathological cases of aliassing
>and side-effects. For example, suppose SQRT was a user function which
>modified the COMMON block variable I as a side-effect. Or suppose, SQRT
>gave a negative result and Y was EQUIVALENCE'd onto X(3). Could even
>a really good data flow analyser cope with such pathologies and would it
>be reasonable to expect it to be able to cope??

No, SQRT cannot be a user function unless it is declared as "EXTERNAL".  Our
compiler uses this info to keep a list of functions that are "optimizable",
that is they have no side effects and return the same result for the same
argument.

					regards,
					joe

--
Full-Name:  Joseph M. Orost
UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
Phone:      (201) 870-5844
Location:   40 19'49" N / 74 04'37" W

robert@cheviot.UUCP (Robert Stroud) (01/18/85)

In article <206@cheviot.UUCP> robert@cheviot.UUCP (that's me!) wrote:

>                    For example, suppose SQRT was a user function which
>modified the COMMON block variable I as a side-effect. Or suppose, SQRT
>gave a negative result and Y was EQUIVALENCE'd onto X(3). Could even
>a really good data flow analyser cope with such pathologies and would it
>be reasonable to expect it to be able to cope??

Judging by a couple of responses to this, I didn't make myself clear
for which I apologise.

I was not suggesting writing another version of SQRT with the same name
but an eccentric behaviour, but asking how an optimising compiler would
cope if the code was identical except that instead of calling SQRT the
programmer called an arbitrary function of his own about which the compiler
knew nothing. My second example, aliassing Y onto X(3), would fail with
a completely standard SQRT function in any case, because Y would no longer
be a loop invariant.

However, getting away from the nitty-gritty of particular pathological cases
I was really trying to make a more general point.

It would seem from other discussions that examples like mine are banned
by the small print of the language standard.

The problem is that such deviations are very hard if not impossible to
detect by a conventional compiler which is therefore at liberty to do
what it likes with "non-standard" code.

Perhaps because I am a European (:-) I worry about this. If it is possible
to write a non-standard program by accident which is incorrectly optimised
by a compiler which conforms to the standard, then we have a potentially
dangerous situation on our hands. Suppose this program is let loose on some
application where lives are at risk? Even if the application is not that
critical, a lot of time and money could be wasted tracking down something
which is not really a compiler bug.

Whether this will be a problem or not depends on how easy it is to write
a non-standard program by accident, and whether such things are rare anyway.

It would presumably be possible to write something like "lint" only more so
which crawled all over a large program looking for deviations from the
standard that an ordinary compiler would have neither the time nor the
inclination to discover.

Any opinions on this? I don't see why the discussion should be restricted
to Fortran. Probably the only way of eliminating aliases and such-like
altogether would be to program in a purely functional language without
assignment or side-effects.

Robert Stroud,
Computing Laboratory,
University of Newcastle upon Tyne

ARPA robert%cheviot%newcastle.mailnet@mit-multics
UUCP ...!ukc!cheviot!robert

rcd@opus.UUCP (Dick Dunn) (01/23/85)

> It would seem from other discussions that examples like mine are banned
> by the small print of the language standard.
> 
> The problem is that such deviations are very hard if not impossible to
> detect by a conventional compiler which is therefore at liberty to do
> what it likes with "non-standard" code.

That may not be the sum of the problem.  A lot rests on how much "small
print" exists in the language standard.  One would like the union of {fine
points which ARE understood by the average programmer} with {fine points
detected by the compiler} to cover all of the nasty fine points.

> Perhaps because I am a European (:-) I worry about this. If it is possible
> to write a non-standard program by accident which is incorrectly optimised
> by a compiler which conforms to the standard, then we have a potentially
> dangerous situation on our hands. Suppose this program is let loose on some
> application where lives are at risk? Even if the application is not that
> critical, a lot of time and money could be wasted tracking down something
> which is not really a compiler bug.

[I want to pick nits and say that if the program is non-standard, it is not
the compiler which is doing "incorrect optimization" but the programmer who
is doing "incorrect programming".]
It seems to be far too difficult to create a real language, useful for
writing significant programs, which has no dark corners.

> It would presumably be possible to write something like "lint" only more so
> which crawled all over a large program looking for deviations from the
> standard that an ordinary compiler would have neither the time nor the
> inclination to discover.

But if it's not part of the compiler, it is much less likely to be used,
particularly if it's as noisy as lint.  Moreover, if it's more thorough
than lint, it will probably be even noisier.  Finally, if the compiler
doesn't have the time (and well it might not), that speaks poorly of the
chance that people will take the extra time to run it.
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...Keep your day job 'til your night job pays.