[comp.arch] Optimization vs. the programmer

lvc@danews.UUCP (04/07/87)

I'm all for optimization but ...

There is a certain computer (that I still use) that
requires the statement:

	n = n;

in order to force the hardware to work right for certain
values of n.  Well fix the **** computer you say, and
so do I.  But that will be a while and I can't wait.

If I had an optimizing compiler that was smart (?$!?)
enough to excise this I couldn't make the machine work
right.

Please dear compiler writers, make the compiler do what I
tell it (or at least have the courtesy to warn me about it).
If hardware was perfect then I wouldn't have a gripe, but
it ain't so I do.
-- 

	Larry Cipriani, AT&T Network Systems, Columbus OH, (614) 860-4999

guy@gorodish.UUCP (04/07/87)

>Please dear compiler writers, make the compiler do what I
>tell it (or at least have the courtesy to warn me about it).
>If hardware was perfect then I wouldn't have a gripe, but
>it ain't so I do.

That's a nice general request, and as such nearly useless.  What does
"do what I tell it" mean?  To what level of detail are you "telling
it"?  Do you think moving invariant computations out of loops is not
"doing what you tell it"?  What about strength reductions?  What
about eliminating assignments to dead variables?

Compiler writers simply can't anticipate every single potential
hardware problem that may crop up and that may require you to put in
code that is in principle redundant.

rpw3@amdcad.UUCP (04/08/87)

In article <479@danews.ATT.COM> lvc@danews.ATT.COM (Larry Cipriani) writes:
+---------------
| There is a certain computer (that I still use) that requires the statement:
| 	n = n;
| in order to force the hardware to work right for certain values of n...
| If I had an optimizing compiler that was smart (?$!?) enough to excise this
| I couldn't make the machine work right.
+---------------

I can't agree. Instead, *tell* the compiler that you need to do this.
In ANSI C (or any useful optimizing C compiler), simply say:

	volatile int n;
	...
	n = n;

And the compiler will *always* fetch & store "n" when referenced.


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun,attmail}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403

lvc@danews.UUCP (04/08/87)

In article <16294@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:

>That's a nice general request, and as such nearly useless.  What does
>"do what I tell it" mean?  To what level of detail are you "telling
>it"?  Do you think moving invariant computations out of loops is not
"doing what you tell it"?  What about strength reductions?  What
>about eliminating assignments to dead variables?
>
>Compiler writers simply can't anticipate every single potential
>hardware problem that may crop up and that may require you to put in
>code that is in principle redundant.

Thanks for your comments.  Your points are well taken.  I guess
I meant at the "source code level" but I see that isn't very useful.

My preference is for a program (perhaps a phase of an optimizing
compiler) that would take source code and generate optimized *source*
code.  Additionally, messages saying why the transformations are
better would be great to have.  I believe such programs are available
for Fortran (or do these concentrate on vectorization) but I've never
seen one for C.

I want to be able to use assembly language debuggers and follow
a *source* listing that I have in front of me on my desk.
Then I would not mind if the optimizer did all these things.

Granted some optimizations occur at the assembly language level,
but those wouldn't affect my ability to follow the code very
much.

What is strength reduction?
-- 

	Larry Cipriani, AT&T Network Systems, Columbus OH, (614) 860-4999

stevev@tekchips.UUCP (04/09/87)

In article <484@danews.ATT.COM>, lvc@danews.ATT.COM (Larry Cipriani) writes:
> My preference is for a program (perhaps a phase of an optimizing
> compiler) that would take source code and generate optimized *source*
> code.  Additionally, messages saying why the transformations are
> better would be great to have.  I believe such programs are available
> for Fortran (or do these concentrate on vectorization) but I've never
> seen one for C.

Even thought a source level "optimization" is machine-independent in
that its legality does not depend on the architecture, whether any
particular optimization will improve the quality of the code cannot
be known unless you know something about the architecture.  The types
of source-level optimizations done for a parallel architecture might
be quite different than for a traditional uniprocessor.

An example of an "optimization" that could turn out to decrease code
quality is that of common subexpression elimination.  The save the value
of the common subexpression requires the allocation of an extra register.
It's possible that the negative effect of tying up the extra register
more than cancels that which is gained by eliminating the redundant
computation.  You need information about the particular architecture
you are targetting in order to make this decision.

Another example: I programmed on an architecture once in which the fastest
way to produce certain constants--due to the archicture's addressing modes--
was to "unfold" the constant into two simpler ones (e.g., 1072 becomes
"134 lsh 3").  Thus, even constant folding is not necessarily always the
optimal thing to do.

A microarchitecture that I was marginally familiar with has such a fast
multiplier that the fastest way to perform a shift of more than some very
small number of bits was to transform into an equivalent multiplication by
a power of two.  On such an architecture, a strength reduction that
transforms a multiplication into a shift might DECREASE code quality.

All this aside, IF you have a pretty good idea what kind of architecture
you're targetting, source-level optimization can generally be quite
effective.

		Steve Vegdahl
		Computer Research Lab
		Tektronix Labs
		Beaverton, Oregon

jfh@killer.UUCP (04/09/87)

The request seems fairly obvious to me.  One writer wrote about how
his line
	n = n;
was necasary for the hardware to work.  I have seen compilers drop that
one on the floor.  There was also some discussion about adding a new
storage class to C named 'volatile' or some so that references of this
seemingly weird type aren't ignored.  The device driver code for some
hardware requires you to read from a silo more than one time to get the
desired peice of information.  If I write

	status = *silo;
	status = *silo;

on a real *smart* compiler, the second assignment may never happen.
I might get real clever and write

	status = *silo;
	status = 0;
	status = *silo;

only to find the first two lines dropped.

If I am stupid enough to write

	n = n;

The compiler should be smart enough to code the turkey.  Compilers
are for compiling, program verifiers are for 'picking the peices of
fluff' from your programs.

- John.

Disclaimer:
	I hold you entirely responsible for what I write.

grunwald@uiucdcsm.UUCP (04/10/87)

Strength reduction is replacement of operations such as "i = j * 8" by
"i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).

As for source-to-source translation: not for me. A lot of the optimizations
(dead-code analysis, colored register allocation, etc) isn't at the level
of the source code.
	It is this class of optimization which kills statements such as
"n = n;". You could recognize ``useless'' statements like this & warn the
person about it, but I think it would be better to simply dis-able the
optimizations at that point and/or code that little-needed routine
in assembly code.
	Vector & concurrentization optimizations can usually be done in
a source-source fashion, but this stuff would just fill your source files
up with meaningless garbage which is impossible to edit/maintain. The
only ``source'' which it would be reasonable to convert it to is assembler.

desj@brahms.UUCP (04/12/87)

In article <3300003@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
>
>Strength reduction is replacement of operations such as "i = j * 8" by
>"i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).
                                  ^^^^^^^^^^^   ^^^^^^^^^^^^^^^^
   I'm sure you mean "i = j & 0x7".  But, more importantly, this is the
right remainder to compute even if j is signed (assuming two's-complement,
of course).  It's too bad more people (especially hardware designers!)
don't realize this.

   -- David desJardins

dik@mcvax.UUCP (04/12/87)

In article <3300003@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
 > 
 > Strength reduction is replacement of operations such as "i = j * 8" by
 > "i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).
 > 
 > As for source-to-source translation: not for me.

And that is right, otherwise you might see i * 14 transformed to
(i << 4) - (i << 1), which is slower on some machines.
(The Pascal compiler on one of the systems I use did this, this
'optimization' has been turned off.)
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

tim@amdcad.UUCP (04/12/87)

In article <3135@jade.BERKELEY.EDU>, desj@lemon.BERKELEY.EDU (David desJardins) writes:
> In article <3300003@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
> >
> >Strength reduction is replacement of operations such as "i = j * 8" by
> >"i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).
>                                   ^^^^^^^^^^^   ^^^^^^^^^^^^^^^^
>    I'm sure you mean "i = j & 0x7".  But, more importantly, this is the
> right remainder to compute even if j is signed (assuming two's-complement,
> of course).  It's too bad more people (especially hardware designers!)
> don't realize this.

I think you should be more explicit in what you are stating here.  The
strength-reduction optimization above does *not* work for signed dividends
(j, in this case) on most current machines, since they define the remainder
to have the same sign as the dividend.  Thus, you seem to be arguing that
this definition be changed to one where remainders are always positive and
integer divide operations always round *down* instead of towards zero (this
is required for 

	(a/b)*b + a%b	==	a

I seem to remember reading that the PL.8 compiler for the IBM ROMP chip
used this definition for divide operations so that the compiler could
perform this very optimization.

There was a discussion a while (1 yr?) ago about this subject, but I can't
remember what conclusions were made (if any).  Anyone care to refresh our
memories?

	Tim Olson
	Advanced Micro Devices
	(tim@amdcad.AMD.COM)

greg@utcsri.UUCP (04/13/87)

In article <3135@jade.BERKELEY.EDU> desj@brahms.Berkeley.EDU (David desJardins) writes:
>In article <3300003@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
>>
>>Strength reduction is replacement of operations such as "i = j * 8" by
>>"i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).
>                                  ^^^^^^^^^^^   ^^^^^^^^^^^^^^^^
>   I'm sure you mean "i = j & 0x7".  But, more importantly, this is the
>right remainder to compute even if j is signed (assuming two's-complement,
>of course).  It's too bad more people (especially hardware designers!)
>don't realize this.

Only if division of a negative number by a positive one produces a
positive % result. On some machines -11%7 may be -4, while -11/7 is -1.
(In fact 'all machines covered by' K&R work this way).

You are then in trouble if -11%8 is 5 instead of -3. You may believe
this definition of % and / is not 'right' ( there was a fearful, loathsome
debate on net.lang.c about 14 mths ago about this ), but if they work
that way, they have to always work that way, even if the divisor is a
power of 2.

There is another analogous advantage to David's 'right' method (where dividing
a - by a + rounds the quotient to -ve and gives a +ve remainder):  A divide by
a power of two can become an arithmetic right shift.

( the ns32k, always eager to please, provides both forms of % and / ).

[ Just stating some facts. PLEASE don't start this debate again - it's
  a religious issue like byte order ]

>   -- David desJardins

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

kent@xanth.UUCP (Kent Paul Dolan) (04/13/87)

In article <3135@jade.BERKELEY.EDU> desj@brahms.Berkeley.EDU (David desJardins) writes:
->In article <3300003@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
->>
->>Strength reduction is replacement of operations such as "i = j * 8" by
->>"i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).
->                                  ^^^^^^^^^^^   ^^^^^^^^^^^^^^^^
->   I'm sure you mean "i = j & 0x7".  But, more importantly, this is the
->right remainder to compute even if j is signed (assuming two's-complement,
->of course).  It's too bad more people (especially hardware designers!)
->don't realize this.
->
->   -- David desJardins

I 'd like to second the motion.  I've lost track of the number of languages
and machines in which I have ended up debugging the mod function because the
result for signed numbers was either negative values for mods of negative
numbers, or a mod function which reflected about zero rather than continuing
the same sawtooth pattern through the origin.  Apparently this is a very
easy function to do wrong, and it is done wrong more often than right.

I'd like to encourage compiler and chip designers to review the mod function's
output for negative inputs before releasing any product which supports mod.
Thanks.

Kent.

firth@sei.cmu.edu (Robert Firth) (04/13/87)

In article <3300003@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
>
>Strength reduction is replacement of operations such as "i = j * 8" by
>"i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).

Oh.  I thought strength reduction was replacing i := j*8 by i := i+8
within a loop whose induction variable was j.  So it's explained in
Wulf's The Design of an Optimising Compiler, anyway.

Also, don't you mean "j&7" in the second replacement?  And in the
first, the replacement is invalid if the expression overflows.

firth@sei.cmu.edu (Robert Firth) (04/13/87)

In article <344@zuring.mcvax.cwi.nl> dik@zuring.UUCP (Dik T. Winter) writes:
>And that is right, otherwise you might see i * 14 transformed to
>(i << 4) - (i << 1), which is slower on some machines.
>(The Pascal compiler on one of the systems I use did this, this
>'optimization' has been turned off.)

Also uses an unnecessary register.  If a compiler makes this change it
will emit ((i<<3)-i)<<1.  Transformations such as these are VALID on
(almost) all machines, but they are not necessarily OPTIMISATIONS.  Mr
Winter and others are right - leave these tricks to the compiler.

jfh@killer.UUCP (04/15/87)

In article <3135@jade.BERKELEY.EDU>, desj@lemon.BERKELEY.EDU (David desJardins) writes:
> In article <3300003@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
> >
> >Strength reduction is replacement of operations such as "i = j * 8" by
> >"i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).
>                                   ^^^^^^^^^^^   ^^^^^^^^^^^^^^^^
>    I'm sure you mean "i = j & 0x7".  But, more importantly, this is the
> right remainder to compute even if j is signed (assuming two's-complement,
> of course).  It's too bad more people (especially hardware designers!)
> don't realize this.
> 
>    -- David desJardins

Okay - I have always assumed (No, not THAT word :-) that the code
	i = j & 0x7;
only worked with j being unsigned (or at least positive by use, e.g. array
subscripts).  My understanding is the % returns the remainder of j / 8.

Third grade math time ...
	Try an example ...
		23 / 8 = 2 r 7 right?
	Now the tricky one ...
		-23 / 8 = -2 r -7 <- my intuition says 'Yes'.

This is the way I understand.  Do the expression (in integer math)

	Remainder = Dividend - (Quotient * Divisor)

So -
	Remainder = -23 - (-2 * 8) = -23 - (-16) = -7.

But how come when I ask my HP-41CV what it thinks, I get 1?  If I ask for
INT of -23 / 8 I get -2, not -3. (Yes, -3 is floor (-23.0 / 8.0)).

New third grade math (binary example) ...
	Try an example ... (just remainders, no quotients)
		0010111 ^ 0000111 = 0000111 = 7 base 10. 
	Now the negative version ...
		1101001 ^ 0000111 = 0000001 = 1 base 10.

The first way seems to make sense - the second way works better but seems to
require a different quotient be calculated when performing the modulo.

- John 		(jfh@killer.UUCP)

No disclaimer.  Whattcha gonna do, sue me?

grunwald@uiucdcsm.cs.uiuc.edu (04/19/87)

Yesszh, lambastization for a typo: yes, it's supposed to be 0x7, not 0x3.

As for my definition of ``reduction in strength'' -- this is the defn. as
put forth in the Dragon book. Then example they give is the PL/I-ish

	length(S1 || S2)	-> length(s1) + length(s2)

In the case where an expression involving a loop iterand is being reduced in
strength, I think that the term is really induction variable expansion and
elimination.

webber@klinzhai.RUTGERS.EDU (Webber) (04/21/87)

In article <3300004@uiucdcsm>, grunwald@uiucdcsm.cs.uiuc.edu writes:
> ...
> As for my definition of ``reduction in strength'' -- this is the defn. as
> put forth in the Dragon book. Then example they give is the PL/I-ish
> 
> 	length(S1 || S2)	-> length(s1) + length(s2)
> 
> In the case where an expression involving a loop iterand is being reduced in
> strength, I think that the term is really induction variable expansion and
> elimination.

``Reduction in strength'' is fairly simple optimization technique named in
the Dragon book (i.e., Aho, Sethi, and Ullman 's ``Compilers:
Principles, Techniques, and Tools'', 1986.  A much more interesting
notion that predates both the Dragon books is ``strength reduction''
(cf., David Gries' text on Compiler Construction).  Here the idea is
to make transformations like:
       for(i=0;i<10;i++) {printf("%d\n",10*i);}
into
       for(j=i=0;i<10;i++,j+=10) {printf("%d\n",j);}
These kinds of things are very important when the program does alot
of multidimensional array accesses, because there is alot of this
sort of thing being generated that the programer is often unaware of
(i.e., it dates back to early efforts to optimize fortran code, such
as the classic fortran H compiler).  It turns out that in section 10.2
of the AhoSethiUllman text, the notion of reduction in strength (that
was originally defined in section 9.9 in the manner of replacements
like  ``x * 2'' becomes ``x + x'') is extended to the for-loop style
optimization that other people study under the name ``strength
reduction''.  Thus, by the end of the book, ``reduction in strength''
means both kinds of optimizations; however, the earlier notion of
``strength reduction'' was generally only applied to the loop-type
optimization.  Hope this clarified things.

-------------- BOB (webber@aramis.rutgers.edu ; BACKBONE!topaz!webber)

frank@zen.UUCP (Frank Wales) (04/24/87)

In article <775@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
>Third grade math time ...
>	Try an example ...
>		23 / 8 = 2 r 7 right?
>	Now the tricky one ...
>		-23 / 8 = -2 r -7 <- my intuition says 'Yes'.
>
>This is the way I understand.  Do the expression (in integer math)
>
>	Remainder = Dividend - (Quotient * Divisor)
>
>So -
>	Remainder = -23 - (-2 * 8) = -23 - (-16) = -7.
>
>But how come when I ask my HP-41CV what it thinks, I get 1?  If I ask for
>INT of -23 / 8 I get -2, not -3. (Yes, -3 is floor (-23.0 / 8.0)).
...
>- John 		(jfh@killer.UUCP)

[this is a bit of a digression, but...]

The problem is that there is more than one way of computing the mod
function, and each has an effect on the outcome with negative numbers.
The HP-41 implements MOD as:

		MOD(Y,X) = Y - MINT(Y/X)*X	

where MINT(Y/X) is the maximum integer not larger than Y/X.
Special cases are where X==0 (result is Y) and Y==0 (result is 0). [1]

The HP-71 (the HP-41's big cousin) actually offers three different remainder
functions to get around this problem:

      MOD(X,Y), which works like the 41's MOD;
      RED(X,Y), which evaluates X - Y * N, where N is the nearest integer to
		X/Y -- it is the remainder function defined as REM in the IEEE
		Floating Point Standard;
      RMD(X,Y), which evaluates X - Y * IP(X/Y), where IP is the Integer Part
		of (X/Y) [as would be found by INT on the HP-41] -- it is
		the remainder function defined as REM in the ANSI BASIC
		Standard.

As implemented on the HP-71, X==Infinity or Y==0 yield an Invalid Arg error.[2]

Your mission, should you choose to accept it, is to decide which of these the
% operator in C represents (or even, which you think it should represent :-).

{
References:
[1] -- Extend Your HP-41 by W.A.C. Mier-Jedrzejowicz, Ph.D., p 53
[2] -- HP-71 Reference Manual, pp 190, 240, 252.
}

[digression ends...]

Frank Wales                 [frank@zen.uucp<->..!mcvax!ukc!zen.co.uk!frank]
Development Engineer, Zengrange Ltd, Greenfield Rd, Leeds, England, LS9 8DB.

mouse@mcgill-vision.UUCP (der Mouse) (04/26/87)

In article <919@aw.sei.cmu.edu.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes:
> In article <3300003@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
>> Strength reduction is replacement of operations such as "i = j * 8"
>> by "i = j << 3", or "i = j % 8" by "i = j & 0x3" (if j is unsigned).

> [second example should read 7, not 3].  And in the first, the
> replacement is invalid if the expression overflows.

Why?  j*8 will overflow if and only if j<<3 overflows....the only way I
can see a problem is if overflow checking is turned on and either (a)
multiply overflows trigger it but shift overflows don't or (b) shift
overflows trigger it but multiply overflows don't.  How many machines
are there that trigger overflow exceptions on the one sort of overflow
but not on the other?  (Yes, a VAX does too trap on a shift overflow,
or at least the Architecture Reference Manual says it's supposed to).

					der Mouse

				(mouse@mcgill-vision.uucp)