firth@sei.cmu.edu (Robert Firth) (02/27/89)
[ NOT (a OR b) => a NOR b ] csimmons@oracle.UUCP (Charles Simmons) wrote: Since the 'nor' instruction doesn't directly map into the C language, I didn't really expect the compiler to handle this minor special case.) w-colinp@microsoft.UUCP (Colin Plumb) writes: Really? I do. Since the sequence "or $c, $a, $b; not $c, $c" is exactly equivalent to "nor $c, $a, $b" and easily recognised by a peephole optimiser. cik@l.cc.purdue.edu (Herman Rubin) writes: There is no question that in some cases a peephole optimizer can catch things like this. But this requires that someone has anticipated the problem and built it into the optimizer. And how big is the peephole? Here's a data point; I rather think it supports everybody's thesis. When the 'nor' idiom was first posted, I immediately tried it on my MIPS codegenerator. The 'nor' was not generated; the less efficient 'or' followed by 'not' came out. So there's another compiler that missed the good code. However, the reason was very simple - I'd completely overlooked the equivalence when writing the codegenerator. So the failure was human error. Finally, the fix was equally simple: it was necessary to add one more element to a lookup table in a specific part of the codegenerator, that basically says [OR;NOT => NOR]. Of course, none of this matters if you have some automated way of generating the peephole optimisation rules, which seems to me much the best solution to the problem.
jbs@fenchurch.mit.edu (Jeff Siegal) (02/28/89)
>cik@l.cc.purdue.edu (Herman Rubin) writes: > > [...]And how big is the peephole? Every study I've seen claims that three instructions is very nearly always sufficient. The exception being very complex instructions which might be recognized as a substitution for a long sequence of simple instructions. In reality, the long sequence of simple instructions is very unlikely to *exactly* match the complex instruction anyway, so the very long peephole would almost never be of value. For RISC machines, this is not likely to be an issue, and three instructions would (probably) suffice. Multiple passes can often be used to get the same effect as a longer peephole (by using intermediate transformations). Jeff Siegal
tps@chem.ucsd.edu (Tom Stockfisch) (03/03/89)
In article <11201@eddie.MIT.EDU> jbs@fenchurch.UUCP (Jeff Siegal) writes: >>cik@l.cc.purdue.edu (Herman Rubin) writes: >> [...]And how big is the peephole? >Every study I've seen claims that three instructions is very nearly >always sufficient.... >For RISC machines, this is not likely to be an issue, and three >instructions would (probably) suffice. I don't think three instructions is enough if your RISC machine has a lot of parallelization and scheduling is considered. For instance, our Celerity has two integer independent processors that share one memory bus. A fetch ("f") takes 4 cycles if the bus isn't being used by the other processor, and up to 8 cycles if it is. If the results of the fetch are not needed immediately a processor can keep executing while it waits for the bus. Thus, a sequence such as f r1, r1 # fetch from address in r1 rcv r1 # put result in r1 -- really part of fetch instruc bxor r1, r5 # use the result in another operation swll r2, r2 # do something unrelated to r1 swll r3, r3 # ditto swll r4, r4 # ditto runs much faster if written as f r1, r1 rcv r1 swll r2, r2 # the "swll"s now execute if 2nd processor swll r3, r3 # is hogging memory bus swll r4, r4 bxor r1, r5 # no harm waiting till now to do this Our peephole optimizer appears to have a 3 instruction window and will change the first sequence to f r1, r1 rcv r1 swll r2, r2 # too bad only one "swll" gets moved bxor r1, r5 swll r3, r3 swll r4, r4 I have hand-moved instruction sequences like this and gotten considerable speed improvement when more than one process is executing on the machine. >Multiple passes can often be used to get the same effect as a longer >peephole (by using intermediate transformations). I don't think that would help here. In fact, to do the best job of scheduling an extremely large window can be necessary. Perhaps some specialized scheduler pass would be more practical. -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu
jbs@fenchurch.mit.edu (Jeff Siegal) (03/05/89)
In article <420@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes: >I don't think three instructions is enough if your RISC machine has >a lot of parallelization and scheduling is considered. >[...]Perhaps some >specialized scheduler pass would be more practical. I have to agree with this last assessment. In general, scheduling is very different from peephole optimization. A peephole optimizer usually transforms a code sequence into an equivalent (but shorter) code sequence, while a scheduler will be concerned with ordering of instructions. In addition, the strategies for selecting the optimal (or at least better) are quite different from those used in a peephole optimizer. Jeff Siegal