[net.arch] RISC delayed branch

firth@bd.sei.cmu.edu (Robert Firth) (08/12/86)
Can an Assembler do as well as a Compiler in
moving code to fll the Noop after a delayed branch?

When I tried it that way, the answer was "almost
as well".  Actually, what I used was a typical
"moving window" peephole optimisation pass over
the generated assembler code, but the assembler
itself could do that too.

To a rough approximation, there is about a 40%
chance that two adjacent instructions are independent,
ie can be permuted, assuming neither is a conditional
branch.  What we want to do is change

	ACTION
	BRANCH

into

	BRANCH
	ACTION

and also

	ACTION
	TEST
	COND BRANCH

into

	TEST
	COND BRANCH
	ACTION

Well, with a window 4 instructions wide, and the
above 40% rule, you can permute something into the
noop slot about 82% of the time, which is pretty
good.

The one significant case I found where a compiler
could do better was when a loop could be rotated
to fill the noop, ie an ACTION brought down from
the top of iteration N+1 to fill the hole after the
branch back from iteration N.  Not easy for an Assembler,
but not very hard for a compiler using a graph form
of the code.

Robert Firth