[net.arch] Where are delayed branches handled?

radford@calgary.UUCP (08/11/86)

Something I've wondered about:

On RISC machines with delayed branches, who does the instruction
rearrangement that tries to put useful instructions after the branch?

Obvious possibilities are the assembler or the compiler. Does anyone
know whether the compiler can do a significantly better job than 
the assembler? (I'm assuming the compiler generates assembler code.)

    Radford Neal
    The University of Calgary

mash@mips.UUCP (John Mashey) (08/12/86)

In article <299@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes:
>On RISC machines with delayed branches, who does the instruction
>rearrangement that tries to put useful instructions after the branch?
>Obvious possibilities are the assembler or the compiler. Does anyone
>know whether the compiler can do a significantly better job than 
>the assembler? (I'm assuming the compiler generates assembler code.)

1) Ours is in the assembler, and I think most others are, also.  I haven't
seen one that was in the compiler; maybe others have and would say so.

2) Our assembler does a lot of code selection itself to make life saner
for the compilers, i.e.,  load immediate reg,value can be 1 or 2 instructions,
and multiply by constant yields all sorts of sequences.
Pipeline reorganization naturally happens after such expansions.

3) It's possible that compilers might do it better; however, I don't see much
evidence of this [and I've looked at many 1000s of lines of dis-assembled code]:
Either:
	a) The reorganizer already fills the branch delay slot.
	b) The branch delay is there, and there's nothing anyone could 
	conceivably do about it.
	c) A modest smartening of the reorganizer would get rid of it,
	perhaps with also passing a little more information from earlier passes.
Most are in a) or b), and my gut feeling [no data] is that compilers can't
get much of c) that the assembler can't.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

steve@jplgodo.UUCP (Steve Schlaifer x43171 301/167) (08/13/86)

In article <299@vaxb.calgary.UUCP>, radford@calgary.UUCP (Radford Neal) writes:
> Something I've wondered about:
> 
> On RISC machines with delayed branches, who does the instruction
> rearrangement that tries to put useful instructions after the branch?
> 
> Obvious possibilities are the assembler or the compiler. Does anyone
> know whether the compiler can do a significantly better job than 
> the assembler? (I'm assuming the compiler generates assembler code.)

An optimizing assembler?  What a disaster that would be.  I want assemblers to
leave my code exactly as I wrote it.  Compilers on the other hand are
translating from high to low level languages, and have a long tradition
of performing all kinds of optimizations on the final results.  Moving
code to after delayed branches is just another kind of optimization.

To answer your question, the compiler is the only place where such
optimizations can reasonably be done.
-- 

...smeagol\			Steve Schlaifer
......wlbr->!jplgodo!steve	Advance Projects Group, Jet Propulsion Labs
....logico/			4800 Oak Grove Drive, M/S 156/204
				Pasadena, California, 91109
					+1 818 354 3171

quiroz@rochester.ARPA (Cesar Quiroz) (08/14/86)

Expires:

Sender:

Followup-To:

>> In article <299@vaxb.calgary.UUCP>, radford@calgary.UUCP (Radford
>> Neal) writes:
>> > Something I've wondered about:
>> > 
>> > On RISC machines with delayed branches, who does the instruction
>> > rearrangement that tries to put useful instructions after the branch?
>> > 

To which steve@jplgodo.UUCP replies:
>> An optimizing assembler?  What a disaster that would be.  I  want
>> assemblers to leave my code exactly as I wrote it.

This position assumes that there is a point in writing code in the
lowest levels of a RISC architecture and that strict adherence to
the letter of such code is desirable.

Although both points have some strength when applied to standard
machines, they both get weakened with more complicated cases (for
instance, RISCs or machines with strange synchronization
requirements).   The Assemblers for those machines are designed
mainly as postprocessors for compiler output, so they are part of
the compilation process (in Unix terms, they allow you to put your
layout for a.out in a single program, not in each compiler).  Seen
from this perspective, it is not outrageous that those assemblers
take liberties (which might be an option anyway) with their input.

On a slightly different point, there is precedent for assemblers
helping in opcode selection.  Span-dependent jumps and calls, for
instance, can be selected by the assembler, so you don't have to
keep a running location-counter as you code; you just say 'j-or-b
foo' and the assembler decides on a jump or a branch, depending on
how far away foo ends up being.

If you are forced to write code in those assemblers AND YOU WANT
NO-OPS in the delayed branches, I guess you have the right to insist
in keeping that as an option.  Same goes for SIMDs with software
control of the pipeline interlocks, microprogrammable graphics
systems and what-have-you, if the assembler level contains any
weirdness that you might not want to see: let the assembler do some
clean-up, but keep as an option straight code production.
-- 
Cesar Augusto  Quiroz Gonzalez
Department of Computer Science     {allegra|seismo}!rochester!quiroz
University of Rochester            or
Rochester,  NY 14627               quiroz@ROCHESTER

josh@polaris.UUCP (Josh Knight) (08/15/86)

In article <612@mips.UUCP> mash@mips.UUCP (John Mashey) writes:
>In article <299@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes:
>>On RISC machines with delayed branches, who does the instruction
>>rearrangement that tries to put useful instructions after the branch?
>>Obvious possibilities are the assembler or the compiler. Does anyone
>>know whether the compiler can do a significantly better job than 
>>the assembler? (I'm assuming the compiler generates assembler code.)
>
>1) Ours is in the assembler, and I think most others are, also.  I haven't
>seen one that was in the compiler; maybe others have and would say so.
 
I don't believe the PL.8 compiler for the 801 (or for the ROMP) generates
assembly language.  I.e. any code movement to take advantage of delayed
branches is done by the compiler.
 
-- 

	Josh Knight, IBM T.J. Watson Research
 josh@ibm.com, josh@yktvmh.bitnet,  ...!philabs!polaris!josh

patc@tekcrl.UUCP (Pat Caudill) (08/15/86)

	Although I usually want an assembler to assemble
what I coded, there is precidence for assemblers doing
optimization. Besides long short jump, opcode selection
for imed, short, long offset or macros. I beleive the SOAP
(Symbolic Optimizing Assembly Program) did instruction
scheduling for a machine which this was very important
and also very difficult. Code for the machine was mainly
in assembly language. The machine is now somewhat dated
however.

			Pat Caudill
			Tektronix!tekcrl!patc.UUCP

johnl@ima.UUCP (John R. Levine) (08/15/86)

In the AIX C compiler for the RT PC, we handled the delayed branches in the
peepholer for the compiler.  I wrote a very vanilla assembler by mutating
(a lot) the Sys III Vax assembler, so it did nothing more fancy than choosing
instruction formats and handling long vs. short jumps.

The peepholer seemed to be the best place because it could make assumptions
about the kind of code emitted by the compiler.  Doing the right thing was
a little tricky because only about half of the RT's instructions change the
condition code, and you had to look back in the instruction stream for one
of those.
-- 
John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400
{ ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.EDU
The opinions expressed herein are solely those of a 12-year-old hacker
who has broken into my account and not those of any person or organization.

mash@mips.UUCP (John Mashey) (08/16/86)

In article <823@jplgodo.UUCP> steve@jplgodo.UUCP (Steve Schlaifer x43171 301/167) writes:
>An optimizing assembler?  What a disaster that would be.  I want assemblers to
>leave my code exactly as I wrote it....
>
>To answer your question, the compiler is the only place where such
>optimizations can reasonably be done.

1) This is an example of an authoritatively-stated non-fact, a thing seen
all too often on the net. A better to way to express this opinion might be:
"It is my opinion that compilers are the place to do this.  Does anybody
know of real counterexamples?"

2) The MIPS assembler does this all the time, and it works fine.

3) There are occasional cases where you do want to turn it off: for
example: architectural verification tests often want to generate very
precisely-chosen code.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

dms@mit-hermes.ARPA (David M. Siegel) (08/16/86)

   From: steve@jplgodo.UUCP
   Newsgroups: net.arch
   Date: 13 Aug 86 16:45:20 GMT

   An optimizing assembler?  What a disaster that would be.  

Actually, optimzing assemblers are fairly common, since they are very
easy to write. If you have an existing compiler that doesn't optimize,
and you need to boost performance a bit, fiddling directly with the
assembler code output is the easiest thing to do.
-- 

					-Dave

mash@mips.UUCP (John Mashey) (08/17/86)

In article <187@ima.UUCP> johnl@ima.UUCP (John R. Levine) writes:
>In the AIX C compiler for the RT PC, we handled the delayed branches in the
>peepholer for the compiler.
>The peepholer seemed to be the best place because it could make assumptions
>about the kind of code emitted by the compiler.  Doing the right thing was
>a little tricky because only about half of the RT's instructions change the
>condition code, and you had to look back in the instruction stream ....

1) It's good to see more data: there seems to be a consensus that reorganization
is either in the assembler or very late (peephole time) in the compiler.

2) The condition code note is interesting: it's exactly this kind of thing
that encouraged us to remove any trace of condition codes!
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

mash@mips.UUCP (John Mashey) (08/20/86)

In article <688@polaris.UUCP> josh@polaris.UUCP (Josh Knight) writes:
...
>>>On RISC machines with delayed branches, who does the instruction
>>>rearrangement that tries to put useful instructions after the branch?
...
>>
>>1) Ours is in the assembler, and I think most others are, also.  I haven't
>>seen one that was in the compiler; maybe others have and would say so.
> 
>I don't believe the PL.8 compiler for the 801 (or for the ROMP) generates
>assembly language.  I.e. any code movement to take advantage of delayed
>branches is done by the compiler.

Thanx for the note.  I reread the HP Spectrum papers and they seem to
be in the compiler also, although, as far as I could tell, fairly
late, i.e., peephole time.  Ours passes binary between compiler and assembler,
i.e., the "assembler" is really an ascii-binary conversion in front of the
the common assembler.  Maybe the real issues are:
	a) When you write handcoded assembler, do you get the optimizations
	or not?
	b) Is the pipeline scheduling early in the compilation, or late?
	(Regardless of the implementation, most cases seem to be late,
	i.e., peephole time, whether inside the compielr or the assembler.
	Of course, careful code generation helps the pipeline scheduler
	by careful choices of registers and other things.)  Now, does anyone
	have some instances where pipeline scheduling is done by, for
	example, the global optimization phase?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

patcl@hammer.UUCP (Pat Clancy) (08/21/86)

In article <624@mips.UUCP> mash@mips.UUCP (John Mashey) writes:
>In article <823@jplgodo.UUCP> steve@jplgodo.UUCP (Steve Schlaifer x43171 301/167) writes:
>>An optimizing assembler?  What a disaster that would be.  I want assemblers to
>>leave my code exactly as I wrote it....
>>
>>To answer your question, the compiler is the only place where such
>>optimizations can reasonably be done.
>
>2) The MIPS assembler does this all the time, and it works fine.
>

There's also a more fundamental issue involved in the choice of
where to do reorganizition; namely, the potential interdependence
between register allocation and reorganization opportunities.
This is especially apparent with global (eg.: coloring) allocation
schemes. A particular choice of register assignment will determine
(to some extent) how code may be rearranged to fill delayed
branch, handle interlock, etc. Then, if the rearranged code were
to be run once again through the register allocator, a more
optimal register assignment might be found (since liveness
ranges of values are now different). Then this output from the
allocator might be run through the reorganizer again..., etc.
This kind of feedback approach is impossible if register
allocation and reorganization are done in different programs
(compiler and assembler, respectively).

Pat Clancy
Tektronix

srm@iris.berkeley.edu (Richard Mateosian) (08/22/86)

>I believe the SOAP (Symbolic Optimizing Assembly Program) did instruction
>scheduling.

Actually not exactly.  It never reordered instructions.  On the IBM 650, 
the main memory was a 2000-word drum, and each instruction had the format

           op  addr   addr

The first addr was the drum address of the operand, and the second was the
drum address of the next instruction. (You thought linked lists were
invented by software people?)

SOAP let you omit specifying the second addr.  Then by taking account of
how far the drum would turn during execution of op, it found an optimal
drum placement for the next instruction.


Richard Mateosian    ...ucbvax!ucbiris!srm 	     2919 Forest Avenue     
415/540-7745         srm%ucbiris@Berkeley.EDU        Berkeley, CA  94705

alverson@decwrl.DEC.COM (Robert Alverson) (08/22/86)

There is an aspect of this that I think people are missing.  If you let
the assembler take care of design-specific architectural features, then
those features can be more easily changed in the future.  I think the
MIPS-X assembler also does the instruction scheduling.  Of course, if
all your programs are in a HLL, scheduling in the code generator seems
as appropriate.  There are some people, though, who feel that a very
low level intermediate form (almost assembler) is desirable for its
independence across a family of machines.

Bob

echo $DISCLAIMER

agr@vaxine.UUCP (Arnold Reinhold) (08/26/86)

In article <931@tekcrl.UUCP> patc@tekcrl.UUCP (Pat Caudill) writes:
>
>	Although I usually want an assembler to assemble
>what I coded, there is precidence for assemblers doing
>optimization. Besides long short jump, opcode selection
>for imed, short, long offset or macros. I beleive the SOAP
>(Symbolic Optimizing Assembly Program) did instruction
>scheduling for a machine which this was very important
>and also very difficult. Code for the machine was mainly
>in assembly language. The machine is now somewhat dated
>however.
>
The IBM 650 was a drum memory machine with 2000 ten digit decimal words.
Instructions consisted of a two digit op code, a four digit operand address
and a four digit address for the next instruction.  Optimization consisted
of placing the next instruction so it would be just in front of one of the
ten read heads when the previous instruction had finished.

SOAP did this by a crude aproximation of execution times, there wasn't
room for a complete table.  Real programmers wrote in machine language
and optimized with the aid of an IBM preprinted chart of all memory
locations so you could check them off as you used them.  SOAP did not
change the order of execution of instructions at all, however.

For sanity, any assembler should have a what you wrote is what you get
option.  It would seem vital for the diagnostic programmer, if no one else.

Arnold Reinhold
Automatix Inc.
vaxine!agr