[comp.arch] Branch Delay Slots

glew@pdx007.intel.com (Andy Glew) (03/28/91)

>The moral of the story is that the delayed branching should be designed around
>the best that the compiler can do, not the idiosyncracies of a particular
>implementation.  The compiler should be able to generate code that uses all
>the available pipelining, without worrying about precisely how much pipelining
>that is.

Ummm...

It sometimes annoys me that more people are not aware of the work done
at the University of Illinois on the "forward semantic" style of
delayed branches.  This is a compiler technique that can effectively
use up to 10 delay slots.  The technique is simple: they replicate
code from the target in the delay slots.  They permit branches in the
delay slots, in a nice manner that requires only one PC.  They use
trace scheduling to select which code to place after branches in the
delay slots. Effective trace scheduling also restricts executable
expansion to a fairly small percentage.

Enuff said: I hope that someone from the IMPACT Group can be prompted to
provide more details on the forward semantic.

--
---

Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

This is a private posting; it does not indicate opinions or positions
of Intel Corp.