earle@JATO.JPL.NASA.GOV (Greg Earle - Sun Software Support) (05/03/89)
For anyone (FSF, whomever) considering porting gcc to the new SPARC-based Sun workstations that we just released (SPARCstation-1 a.k.a. Sun-4/60; and the SPARCstation/SPARCserver 300 series, a.k.a. Sun-4/330, Sun-4/370, and 390) you may be interested in the following fact: The Sun-4/260 has a write-back cache, and thus it is not bothered by any back-to-back writes done by a program. On the other hand, the new SPARCstation machines have a write-through cache. Both write buffers are only one write deep. On the SPARCstation-1, it is 32 bits deep; on the SPARCstation/SPARCserver 300 series, it is 64 bits deep. Because of this, back-to-back writes can tie up the memory bus. Sun's next version of the FORTRAN compiler was modified to know about this. I don't believe the C compiler was, however. Given this knowledge, if it is possible to modify the SPARC version of gcc to know about this somehow, I hope that this provides enough information to be able to work around this somehow. I don't know enough about compiler technology to know how one would recognize a back-to-back write taking place in code generation; I hope someone out there does. - Greg Earle Sun Microsystems, Inc. JPL on-site Software Support earle@Sun.COM earle@mahendo.JPL.NASA.GOV (Guest)
bunda@cs.utexas.edu (John Bunda) (05/03/89)
In article <8905030323.AA27122@jato.Jpl.Nasa.Gov>, earle@JATO.JPL.NASA.GOV (Greg Earle - Sun Software Support) writes: > [is it possible to modify gcc to avoid back-to-back writes?] I have been working on scheduling code to allow gcc to reorder instructions to deal with this problem in general, but that doesn't help at the moment. Short of hacking gcc proper, it *might* be possible to catch some cases with peepholes in the machine description, but this would not be a complete (or satisfying) solution. -John -- ................................... John Bunda UT CS Dept. bunda@cs.utexas.edu Austin, Texas
david@sun.com (J.R. ``Bob'' Dobbs) (05/03/89)
In article <8905030323.AA27122@jato.Jpl.Nasa.Gov> earle@JATO.JPL.NASA.GOV (Greg Earle - Sun Software Support) writes: >On the other hand, the new SPARCstation machines have a write-through cache. >Both write buffers are only one write deep. On the SPARCstation-1, it is >32 bits deep; on the SPARCstation/SPARCserver 300 series, it is 64 bits deep. >Because of this, back-to-back writes can tie up the memory bus. For the same reason, you shouldn't schedule a load right after a store. -- David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@sun.com
grunwald@flute.cs.uiuc.edu (05/03/89)
I've been curious about the talk of scheduling code mentioned here and elsewhere. Other than the possibility of schedules affecting register allocation, is there any reason to do schedules in the compiler? Would this be better supported by the assembler and/or an object-code-to-object-code scheduler? How is this done in the MIPS compiler? -- Dirk Grunwald Univ. of Illinois grunwald@flute.cs.uiuc.edu
brooks@vette.llnl.gov (Eugene Brooks) (05/04/89)
In article <GRUNWALD.89May3093539@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: > >I've been curious about the talk of scheduling code mentioned here and >elsewhere. > >Other than the possibility of schedules affecting register allocation, >is there any reason to do schedules in the compiler? Would this be >better supported by the assembler and/or an object-code-to-object-code >scheduler? The key problem for a "post processor" scheduler is the compiler reusing a register in a way which prevents efficient scheduling. For instance, GCC will, given the following code and a load store architecture a = b + c; d = e + f; will emit something along the lines of load r0,c load r1,b add r0,r1 store r0,a load r0,f load r1,e add r0,r1 store r0,d which you have a hard time getting much from a scheduler due to the resource conflicts generated by the register allocation. We faced this problem for a simulated load store architecture which had quite a few registers, using a PCC based compiler, and solved it by changing the scratch register allocator to run "round robin" around the available registers when looking for one. I doubt it is "optimal" but the simple heuristic works very well in practice. A post processing optimizer now does not run into the conflict problem above, and in fact can be taught to do all kinds of clever common subexpression and redundant load removals easily with simple pattern matching. For our simulated architecture the operations were of the form op dest,op1,op2 which did not destroy register contents. This was quite useful in improving optimizer effectiveness. I have checked that you can suitably change the register allocator for GCC. You do not need many registers for the "round robin" trick to work well, 16 scratch registers shows a good effect and 32 is more than you need for typical code. Of course, unrolled loops and long pipeline latencies could use LOTS of registers, but that is what VECTOR registers are for. brooks@maddog.llnl.gov, brooks@maddog.uucp
mash@mips.COM (John Mashey) (05/09/89)
In article <102542@sun.Eng.Sun.COM> david@sun.com (J.R. ``Bob'' Dobbs) writes: >In article <8905030323.AA27122@jato.Jpl.Nasa.Gov> earle@JATO.JPL.NASA.GOV >(Greg Earle - Sun Software Support) writes: >>On the other hand, the new SPARCstation machines have a write-through cache. >>Both write buffers are only one write deep. On the SPARCstation-1, it is >>32 bits deep; on the SPARCstation/SPARCserver 300 series, it is 64 bits deep. >>Because of this, back-to-back writes can tie up the memory bus. > >For the same reason, you shouldn't schedule a load right after a store. Hmmm. Surely this isn't as important as spreading the stores, i.e., you'd expect that most loads will hit in the cache, and hence will not conflict with stores, unless there's something very unusual in the memory/cache design. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.COM (John Mashey) (05/09/89)
In article <GRUNWALD.89May3093539@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: > >I've been curious about the talk of scheduling code mentioned here and >elsewhere. > >Other than the possibility of schedules affecting register allocation, >is there any reason to do schedules in the compiler? Would this be >better supported by the assembler and/or an object-code-to-object-code >scheduler? > >How is this done in the MIPS compiler? 1) Most of the code scheduling is done in the assembler. 2) The code generator and assembelr have been tuned to interact well. For example, the classic "round-robin" use of temporaries is used, and that seems to work pretty well with the current pipelines. 3) Code generators, and/or assembly code can turn code reordering on/off. This is necessary, for example, for CPU diagnostics. [The diags folks went berserk the first time they tried to write a specific sequence, and had it munged about!] -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086