[comp.arch] Multiple Insstruction Issue & Low-Level Parallelism

mash@mips.COM (John Mashey) (05/16/89)

There have been a number of discussions on such issues of late.
Here are a few useful papers, both from ASPLOS III (ACM SIGPLAN Notices,
May 1989):

Norm Jouppi & David Wall, "Availabel Instruction-Level Parallelism
for Superscalar and Superpipelined Machines", 272-282.

Michael Smith, Mike Johnson, Mark Horowitz, "Limits on Multiple
Instruction Issue", 290-302.

These: 
	are clearly written, with enough introductory material to
		to be accessible.
	are very relevant to topics upon which leading-edge RISC micro
		folks are working frenziedly.
	have DATA.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

gideony@microsoft.UUCP (Gideon Yuvall) (05/17/89)

Hsu's Ph.D. Thesis (1/86, U of Ill @Urbana, "Highly Concurrent
Parallel Processing") also seems relevant here.

Gideon Yuval, gideony@microsof.UUCP, 206-882-8080 (fax:206-883-8101;TWX:160520)
                                             (TEMPORARY home 'phone: -883-8039)

mcg@mipon2.intel.com (05/30/89)

In article <19755@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>There have been a number of discussions on such issues of late.
>Here are a few useful papers, both from ASPLOS III (ACM SIGPLAN Notices,
>May 1989): [mention of Wall & Jouppi, and Smith, Johnson & Horowitz]

Don't forget Glenn Hinton's Spring Compcon paper, which discusses the
microarchitecture and implementation strategy for Intel's next generation
960.  It is capable of dispatching instructions at a sustained rate of
2 instructions per clock (i.e. 0.5 cycles/instruction), and thus capable
of executing certain algorithms (matrix mul, bresenham, fft) from on-chip
instruction cache at a rate of 66 native MIPS.  The architecture can
dispatch up to three instructions in any cycle: one normal register-register
instruction, one memory instruction (load or store), and one branch.
The implementation includes static branch prediction, 8+ frames (of 16)
of cached registers, and a number of other features.

It is no accident that both papers mentioned by John Mashey conclude that
2-3 instructions/cycle is the practical limit for this generation of
superscalar processors.  Expect to see Intel announce the first true
superscalar microprocessor implementation sometime this year.

If you want to hear more about it, come to my talk at the "Hot Chips"
IEEE symposium at Stanford later in June.  I will be discussing the
architectural support in the 960 for superscalar implementations, as
well as the microarchitecture that implements these features.

S. McGeady
Intel Corp.