mash@mips.COM (John Mashey) (05/16/89)
There have been a number of discussions on such issues of late. Here are a few useful papers, both from ASPLOS III (ACM SIGPLAN Notices, May 1989): Norm Jouppi & David Wall, "Availabel Instruction-Level Parallelism for Superscalar and Superpipelined Machines", 272-282. Michael Smith, Mike Johnson, Mark Horowitz, "Limits on Multiple Instruction Issue", 290-302. These: are clearly written, with enough introductory material to to be accessible. are very relevant to topics upon which leading-edge RISC micro folks are working frenziedly. have DATA. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
gideony@microsoft.UUCP (Gideon Yuvall) (05/17/89)
Hsu's Ph.D. Thesis (1/86, U of Ill @Urbana, "Highly Concurrent Parallel Processing") also seems relevant here. Gideon Yuval, gideony@microsof.UUCP, 206-882-8080 (fax:206-883-8101;TWX:160520) (TEMPORARY home 'phone: -883-8039)
mcg@mipon2.intel.com (05/30/89)
In article <19755@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >There have been a number of discussions on such issues of late. >Here are a few useful papers, both from ASPLOS III (ACM SIGPLAN Notices, >May 1989): [mention of Wall & Jouppi, and Smith, Johnson & Horowitz] Don't forget Glenn Hinton's Spring Compcon paper, which discusses the microarchitecture and implementation strategy for Intel's next generation 960. It is capable of dispatching instructions at a sustained rate of 2 instructions per clock (i.e. 0.5 cycles/instruction), and thus capable of executing certain algorithms (matrix mul, bresenham, fft) from on-chip instruction cache at a rate of 66 native MIPS. The architecture can dispatch up to three instructions in any cycle: one normal register-register instruction, one memory instruction (load or store), and one branch. The implementation includes static branch prediction, 8+ frames (of 16) of cached registers, and a number of other features. It is no accident that both papers mentioned by John Mashey conclude that 2-3 instructions/cycle is the practical limit for this generation of superscalar processors. Expect to see Intel announce the first true superscalar microprocessor implementation sometime this year. If you want to hear more about it, come to my talk at the "Hot Chips" IEEE symposium at Stanford later in June. I will be discussing the architectural support in the 960 for superscalar implementations, as well as the microarchitecture that implements these features. S. McGeady Intel Corp.