[comp.arch] Peephole optimisation

firth@sei.cmu.edu (Robert Firth) (02/27/89)

	[ NOT (a OR b) => a NOR b ]

csimmons@oracle.UUCP (Charles Simmons) wrote:

   Since the 'nor' instruction doesn't directly map into the C language,
   I didn't really expect the compiler to handle this minor special case.)

w-colinp@microsoft.UUCP (Colin Plumb) writes:

   Really?  I do.  Since the sequence "or $c, $a, $b; not $c, $c" is
   exactly equivalent to "nor $c, $a, $b" and easily recognised by a
   peephole optimiser.

cik@l.cc.purdue.edu (Herman Rubin) writes:

   There is no question that in some cases a peephole optimizer can catch
   things like this.  But this requires that someone has anticipated the
   problem and built it into the optimizer.  And how big is the peephole?

Here's a data point; I rather think it supports everybody's thesis.
When the 'nor' idiom was first posted, I immediately tried it on my
MIPS codegenerator.  The 'nor' was not generated; the less efficient
'or' followed by 'not' came out.  So there's another compiler that
missed the good code.

However, the reason was very simple - I'd completely overlooked the
equivalence when writing the codegenerator.  So the failure was human
error.

Finally, the fix was equally simple: it was necessary to add one
more element to a lookup table in a specific part of the codegenerator,
that basically says [OR;NOT => NOR].

Of course, none of this matters if you have some automated way of
generating the peephole optimisation rules, which seems to me much the
best solution to the problem.

jbs@fenchurch.mit.edu (Jeff Siegal) (02/28/89)

>cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>   [...]And how big is the peephole?

Every study I've seen claims that three instructions is very nearly
always sufficient.  The exception being very complex instructions
which might be recognized as a substitution for a long sequence of
simple instructions.  In reality, the long sequence of simple
instructions is very unlikely to *exactly* match the complex
instruction anyway, so the very long peephole would almost never be of
value.

For RISC machines, this is not likely to be an issue, and three
instructions would (probably) suffice.

Multiple passes can often be used to get the same effect as a longer
peephole (by using intermediate transformations).

Jeff Siegal

tps@chem.ucsd.edu (Tom Stockfisch) (03/03/89)

In article <11201@eddie.MIT.EDU> jbs@fenchurch.UUCP (Jeff Siegal) writes:
>>cik@l.cc.purdue.edu (Herman Rubin) writes:
>>   [...]And how big is the peephole?

>Every study I've seen claims that three instructions is very nearly
>always sufficient....
>For RISC machines, this is not likely to be an issue, and three
>instructions would (probably) suffice.

I don't think three instructions is enough if your RISC machine has
a lot of parallelization and scheduling is considered.
For instance, our Celerity has two integer
independent processors that share one memory bus.  A fetch ("f") takes
4 cycles if the bus isn't being used by the other processor, and up to
8 cycles if it is.  If the results of the fetch are not needed immediately
a processor can keep executing while it waits for the bus.  Thus, a sequence
such as

	f r1, r1	# fetch from address in r1
	rcv r1		# put result in r1 -- really part of fetch instruc
	bxor r1, r5	# use the result in another operation
	swll r2, r2	# do something unrelated to r1
	swll r3, r3	# ditto
	swll r4, r4	# ditto

runs much faster if written as

	f r1, r1
	rcv r1
	swll r2, r2	# the "swll"s now execute if 2nd processor
	swll r3, r3	# is hogging memory bus
	swll r4, r4
	bxor r1, r5	# no harm waiting till now to do this

Our peephole optimizer appears to have a 3 instruction window and will
change the first sequence to

	f r1, r1
	rcv r1
	swll r2, r2	# too bad only one "swll" gets moved
	bxor r1, r5
	swll r3, r3
	swll r4, r4

I have hand-moved instruction sequences like this and gotten considerable
speed improvement when more than one process is executing on the machine.

>Multiple passes can often be used to get the same effect as a longer
>peephole (by using intermediate transformations).

I don't think that would help here.  In fact, to do the best job of
scheduling an extremely large window can be necessary.  Perhaps some
specialized scheduler pass would be more practical.
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

jbs@fenchurch.mit.edu (Jeff Siegal) (03/05/89)

In article <420@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>I don't think three instructions is enough if your RISC machine has
>a lot of parallelization and scheduling is considered.
>[...]Perhaps some
>specialized scheduler pass would be more practical.

I have to agree with this last assessment.  In general, scheduling is
very different from peephole optimization.  A peephole optimizer
usually transforms a code sequence into an equivalent (but shorter)
code sequence, while a scheduler will be concerned with ordering of
instructions.  

In addition, the strategies for selecting the optimal (or at least
better) are quite different from those used in a peephole optimizer.

Jeff Siegal