rcg@lpi.liant.com (Rick Gorton) (03/13/91)
Fair warning, this is a fairly lengthly response. Peter Van Roy writes: > > I am in the process of retargeting a compiler for the SPARC. I am building > an instruction reordering stage. To achieve the best performance, I need > information about the memory system and the pipeline structure of several > implementations of the SPARC. There is good news and bad news. Bad news first. The bad news is that the pipelining and instruction timing characteristics depend upon which silicon manufacturer built the chip, and in particular, which chipset was used. If you can GUARANTEE that all SPARCstation 1+ machines use chipset X and all SPARCstation 2's use chipset Y, and you don't care at all about possibly not having optimal performance on other chipsets, then getting the information is merely a matter of talking to the particular chip manufacturer for the SPARCstation 1+ for the 1+ info, and to the chip manufacturer of the 2 for the 2 information. It MAY actually be that different firms are manufacturing the CPUs. The following is from a post by Michael Slater of Microprocessor Report. He posted this to comp.arch Dec, 31m 1990: ] LSI Logic's "Lightning" SPARC processor. Five-chip superscalar ] implementation, dispatches up to four instructions per clock. Uses out-of- ] order instruction execution, speculative execution, and register relabeling. ] ] Texas Instruments' "Viking" SPARC processor. Superscalar and superpipelined, ] dispatches up to three instructions per clock. On-chip caches approximately ] 16 Kbytes each for instructions and data. ] ] Cypress/ROSS Technology's "Pinnacle" SPARC processor. Superscalar, dispatches ] up to two instructions per clock cycle. On chip cache approximately 16 ] Kbytes, external MMU and controller for second-level cache. ] ] SPARC processors combining existing integer and floating-point units from ] Fujitsu and LSI Logic. The good news is that there is SOME information in the SPARC Architecture manual (Version 7) about Instruction scheduling. I can't seem to find the specific section number right now, but the gist of it (as I recall it) was that the IU and FPU can execute instructions simultaneously. Which means that you can get a win by scheduling IU instructions alternately with FPU instructions. Now for specifics (where I have info) > How many cycles are needed to do a load and a store? > Is there any advantage (apart from needing only a single instruction > fetch) to the double-word loads and stores? CHIP Cycle Times LD LDD ST STD LSI L64811: 2 3 3 4 Cypress CY7C601: 2 3 3 4 Fujitsu MB86901: 2 3 3 4 The better news is that, yes, these 3 chipsets all happen to have the same cycle times. But you cannot guarantee this to be true in the future. It will be messy to write an instruction scheduler for a compiler which can generate differently scheduled code for different chipsets by merely using a different compile-time switch. I think you will find that your biggest performance gains will be in scheduling to fill stalls created by the slower floating point instructions, FDIV, FMUL, and FSQRT. Hope this helps. Richard Gorton rcg@lpi.liant.com (508) 626-0006 Language Processors, Inc. Framingham, MA 01760 -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
moss@cs.umass.edu (Eliot Moss) (03/13/91)
In article <9103122023.AA14689@lpi.liant.com> rcg@lpi.liant.com (Rick Gorton) writes:
[re optimizing code for a Sparc's pipeline]
The better news is that, yes, these 3 chipsets all happen to have the same
cycle times. But you cannot guarantee this to be true in the future. It
will be messy to write an instruction scheduler for a compiler which can
generate differently scheduled code for different chipsets by merely using a
different compile-time switch.
Actually, this may not be true of the new improved gcc (v 2.0) with its
instruction scheduling. Since it is driven by essentially tabular
information, it *might* be possible to switch tables based on a switch.
Michael Tiemann could probably say how hard it would be. It would certainly
be easy to generate different versions of the compiler without changes only
to the machine description information used for instruction scheduling.
--
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu
--
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.