[gnu.gcc] Porting gcc to the new Sun SPARCstation 1 and SPARCstation 300 series

earle@JATO.JPL.NASA.GOV (Greg Earle - Sun Software Support) (05/03/89)

For anyone (FSF, whomever) considering porting gcc to the new SPARC-based
Sun workstations that we just released (SPARCstation-1 a.k.a. Sun-4/60; and
the SPARCstation/SPARCserver 300 series, a.k.a. Sun-4/330, Sun-4/370, and 390)
you may be interested in the following fact:

The Sun-4/260 has a write-back cache, and thus it is not bothered by any
back-to-back writes done by a program.

On the other hand, the new SPARCstation machines have a write-through cache.
Both write buffers are only one write deep.  On the SPARCstation-1, it is
32 bits deep; on the SPARCstation/SPARCserver 300 series, it is 64 bits deep.
Because of this, back-to-back writes can tie up the memory bus.

Sun's next version of the FORTRAN compiler was modified to know about this.
I don't believe the C compiler was, however.  Given this knowledge, if it is
possible to modify the SPARC version of gcc to know about this somehow, I
hope that this provides enough information to be able to work around this
somehow.  I don't know enough about compiler technology to know how one would
recognize a back-to-back write taking place in code generation; I hope someone
out there does.

	- Greg Earle
	  Sun Microsystems, Inc.
	  JPL on-site Software Support
	  earle@Sun.COM
	  earle@mahendo.JPL.NASA.GOV	(Guest)

bunda@cs.utexas.edu (John Bunda) (05/03/89)

In article <8905030323.AA27122@jato.Jpl.Nasa.Gov>, earle@JATO.JPL.NASA.GOV (Greg Earle - Sun Software Support) writes:

> [is it possible to modify gcc to avoid back-to-back writes?]

I have been working on scheduling code to allow gcc to reorder
instructions to deal with this problem in general, but that
doesn't help at the moment.  Short of hacking gcc proper,
it *might* be possible to catch some cases with peepholes in
the machine description, but this would not be a complete
(or satisfying) solution.

-John
-- 
...................................
John Bunda            UT CS Dept.      
bunda@cs.utexas.edu   Austin, Texas

david@sun.com (J.R. ``Bob'' Dobbs) (05/03/89)

In article <8905030323.AA27122@jato.Jpl.Nasa.Gov> earle@JATO.JPL.NASA.GOV
(Greg Earle - Sun Software Support) writes:
>On the other hand, the new SPARCstation machines have a write-through cache.
>Both write buffers are only one write deep.  On the SPARCstation-1, it is
>32 bits deep; on the SPARCstation/SPARCserver 300 series, it is 64 bits deep.
>Because of this, back-to-back writes can tie up the memory bus.

For the same reason, you shouldn't schedule a load right after a store.

-- 
David DiGiacomo, Sun Microsystems, Mt. View, CA  sun!david david@sun.com

grunwald@flute.cs.uiuc.edu (05/03/89)

I've been curious about the talk of scheduling code mentioned here and
elsewhere.

Other than the possibility of schedules affecting register allocation,
is there any reason to do schedules in the compiler? Would this be
better supported by the assembler and/or an object-code-to-object-code
scheduler?

How is this done in the MIPS compiler?
--
Dirk Grunwald
Univ. of Illinois
grunwald@flute.cs.uiuc.edu

brooks@vette.llnl.gov (Eugene Brooks) (05/04/89)

In article <GRUNWALD.89May3093539@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes:
>
>I've been curious about the talk of scheduling code mentioned here and
>elsewhere.
>
>Other than the possibility of schedules affecting register allocation,
>is there any reason to do schedules in the compiler? Would this be
>better supported by the assembler and/or an object-code-to-object-code
>scheduler?

The key problem for a "post processor" scheduler is the compiler reusing
a register in a way which prevents efficient scheduling.  For instance,
GCC will, given the following code and a load store architecture
	a = b + c;
	d = e + f;
will emit something along the lines of
	load r0,c
	load r1,b
	add r0,r1
	store r0,a
	load r0,f
	load r1,e
	add r0,r1
	store r0,d
which you have a hard time getting much from a scheduler due to the
resource conflicts generated by the register allocation.

We faced this problem for a simulated load store architecture which had
quite a few registers, using a PCC based compiler, and solved it by
changing the scratch register allocator to run "round robin" around the
available registers when looking for one.  I doubt it is "optimal" but
the simple heuristic works very well in practice.  A post processing optimizer
now does not run into the conflict problem above, and in fact can be taught
to do all kinds of clever common subexpression and redundant load removals
easily with simple pattern matching.  For our simulated architecture the
operations were of the form op dest,op1,op2 which did not destroy register
contents.  This was quite useful in improving optimizer effectiveness.

I have checked that you can suitably change the register allocator for GCC.
You do not need many registers for the "round robin" trick to work well,
16 scratch registers shows a good effect and 32 is more than you need for
typical code.  Of course, unrolled loops and long pipeline latencies could
use LOTS of registers, but that is what VECTOR registers are for.

brooks@maddog.llnl.gov, brooks@maddog.uucp

mash@mips.COM (John Mashey) (05/09/89)

In article <102542@sun.Eng.Sun.COM> david@sun.com (J.R. ``Bob'' Dobbs) writes:
>In article <8905030323.AA27122@jato.Jpl.Nasa.Gov> earle@JATO.JPL.NASA.GOV
>(Greg Earle - Sun Software Support) writes:
>>On the other hand, the new SPARCstation machines have a write-through cache.
>>Both write buffers are only one write deep.  On the SPARCstation-1, it is
>>32 bits deep; on the SPARCstation/SPARCserver 300 series, it is 64 bits deep.
>>Because of this, back-to-back writes can tie up the memory bus.
>
>For the same reason, you shouldn't schedule a load right after a store.

Hmmm.  Surely this isn't as important as spreading the stores, i.e.,
you'd expect that most loads will hit in the cache, and hence will not
conflict with stores,  unless there's something very unusual in the
memory/cache design.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

mash@mips.COM (John Mashey) (05/09/89)

In article <GRUNWALD.89May3093539@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes:
>
>I've been curious about the talk of scheduling code mentioned here and
>elsewhere.
>
>Other than the possibility of schedules affecting register allocation,
>is there any reason to do schedules in the compiler? Would this be
>better supported by the assembler and/or an object-code-to-object-code
>scheduler?
>
>How is this done in the MIPS compiler?

1) Most of the code scheduling is done in the assembler.
2) The code generator and assembelr have been tuned to interact well.
For example, the classic "round-robin" use of temporaries is used, and
that seems to work pretty well with the current pipelines.
3) Code generators, and/or assembly code can turn code reordering on/off.
This is necessary, for example, for CPU diagnostics.  [The diags folks
went berserk the first time they tried to write a specific sequence,
and had it munged about!]
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086