[comp.arch] Sustained Performance Computer Architecture

ad@romeo.cs.duke.edu (Apostolos Dollas) (04/25/89)

A new general-purpose computer architecture with sustained instruction pipeline 
performance (regardless of program flow changes) has been developed at the 
Digital Systems Laboratory of the Department of Electrical Engineering
at Duke University. The features of the new architecture will be presented
on Wednesday, April 26. The presentation will be at the Teer Auditorium,
on Science Drive (Duke West Campus), at 2:00pm; it will be followed at 
3:00pm by a demonstration of the first proof-of-concept prototype of the 
architecture and refreshments. All those interested are welcome to attend. 

Since many netters will not be able to attend, here is a brief summary:

The execution unit (EU) of the new architecture can be that of any typical
von Neumann computer. The instruction fetch and decode operations are 
performed by multiple instruction decode units (IDU) which prefetch all 
potentially needed instructions. The knowledge of the segments of code which 
will potentially be needed during any portion of the execution
is extracted at compilation time in the form of a program flow graph. 
This information is used during execution by the so-called ``Program
Execution Controller'', which preloads segments to the IDU's prior to their
need by the EU. The decoded segments are ready for execution at all times, 
hence there are no bubbles in the instruction pipeline, and no I-cache misses.
Detailed simulations on existing architectures (such as the Motorola 68020 and
the IBM ROMP) were used to determine speedups for existing benchmark programs.
Typical program speedups range from 10% to over 50% from branches (conditional,
unconditional, and subroutine calls and returns). These figures do not account
for cache misses which do not occur in the new architecture, resulting in
better than the above stated performance. The new architecture is instruction
set and technology independent.

A proof-of-concept machine has been constructed with one EU and three IDU's,
demonstrating the basic principles of operation of the new architecture. 

Apostolos Dollas                  |    ad@duke.cs.duke.edu
Assistant Professor               |
Dept. of Electrical Engineering   |
Duke University                   |
Durham, NC 27706                  |

mark@hubcap.clemson.edu (Mark Smotherman) (04/25/89)

In article <14292@duke.cs.duke.edu>, ad@romeo.cs.duke.edu (Apostolos Dollas) writes:
> A new general-purpose computer architecture with sustained instruction pipeline 
> performance (regardless of program flow changes) has been developed at the 
> ... The instruction fetch and decode operations are 
> performed by multiple instruction decode units (IDU) which prefetch all 
> potentially needed instructions. The knowledge of the segments of code which 
> will potentially be needed during any portion of the execution
> is extracted at compilation time in the form of a program flow graph. 
> 
> Apostolos Dollas                  |    ad@duke.cs.duke.edu

How does this proposed architecture attack the short runlengths between
conditional branches?  Riseman and Foster thought about an infinite
resource machine in '71 or so (IEEE TC paper) and decided that they
could get a mere ten-fold speedup if they allowed 64K (yes, 64*1024)
prefetch paths.  How does compile-time info solve the power-of-two
branch path problem?  Do you profile and add compensation code (as does
Multiflow Trace)?  Do you back out of prefetched instructions when you
decide you have predicted the wrong path (as does HPS)?

Are you publishing this work?  Comp. Arch. Symp.?  ASPLOS?  IEEE TC?
-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark

andrew@frip.wv.tek.com (Andrew Klossner) (04/27/89)

[]

	"A new general-purpose computer architecture with sustained
	instruction pipeline performance (regardless of program flow
	changes) has been developed at the Digital Systems Laboratory
	of the Department of Electrical Engineering at Duke University
	... The execution unit (EU) of the new architecture can be that
	of any typical von Neumann computer. The instruction fetch and
	decode operations are performed by multiple instruction decode
	units (IDU) which prefetch all potentially needed instructions
	... The decoded segments are ready for execution at all times,
	hence there are no bubbles in the instruction pipeline, and no
	I-cache misses."

Never an instruction miss?  What happens when executing code for a
switch statement, such as:

	load	r2,jumptable(r1)
	jmp	r2

The number of IDUs required to avoid all instruction misses would seem
to be unbounded.

  -=- Andrew Klossner   (uunet!tektronix!orca!frip!andrew)      [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]