ad@romeo.cs.duke.edu (Apostolos Dollas) (04/25/89)
A new general-purpose computer architecture with sustained instruction pipeline performance (regardless of program flow changes) has been developed at the Digital Systems Laboratory of the Department of Electrical Engineering at Duke University. The features of the new architecture will be presented on Wednesday, April 26. The presentation will be at the Teer Auditorium, on Science Drive (Duke West Campus), at 2:00pm; it will be followed at 3:00pm by a demonstration of the first proof-of-concept prototype of the architecture and refreshments. All those interested are welcome to attend. Since many netters will not be able to attend, here is a brief summary: The execution unit (EU) of the new architecture can be that of any typical von Neumann computer. The instruction fetch and decode operations are performed by multiple instruction decode units (IDU) which prefetch all potentially needed instructions. The knowledge of the segments of code which will potentially be needed during any portion of the execution is extracted at compilation time in the form of a program flow graph. This information is used during execution by the so-called ``Program Execution Controller'', which preloads segments to the IDU's prior to their need by the EU. The decoded segments are ready for execution at all times, hence there are no bubbles in the instruction pipeline, and no I-cache misses. Detailed simulations on existing architectures (such as the Motorola 68020 and the IBM ROMP) were used to determine speedups for existing benchmark programs. Typical program speedups range from 10% to over 50% from branches (conditional, unconditional, and subroutine calls and returns). These figures do not account for cache misses which do not occur in the new architecture, resulting in better than the above stated performance. The new architecture is instruction set and technology independent. A proof-of-concept machine has been constructed with one EU and three IDU's, demonstrating the basic principles of operation of the new architecture. Apostolos Dollas | ad@duke.cs.duke.edu Assistant Professor | Dept. of Electrical Engineering | Duke University | Durham, NC 27706 |
mark@hubcap.clemson.edu (Mark Smotherman) (04/25/89)
In article <14292@duke.cs.duke.edu>, ad@romeo.cs.duke.edu (Apostolos Dollas) writes: > A new general-purpose computer architecture with sustained instruction pipeline > performance (regardless of program flow changes) has been developed at the > ... The instruction fetch and decode operations are > performed by multiple instruction decode units (IDU) which prefetch all > potentially needed instructions. The knowledge of the segments of code which > will potentially be needed during any portion of the execution > is extracted at compilation time in the form of a program flow graph. > > Apostolos Dollas | ad@duke.cs.duke.edu How does this proposed architecture attack the short runlengths between conditional branches? Riseman and Foster thought about an infinite resource machine in '71 or so (IEEE TC paper) and decided that they could get a mere ten-fold speedup if they allowed 64K (yes, 64*1024) prefetch paths. How does compile-time info solve the power-of-two branch path problem? Do you profile and add compensation code (as does Multiflow Trace)? Do you back out of prefetched instructions when you decide you have predicted the wrong path (as does HPS)? Are you publishing this work? Comp. Arch. Symp.? ASPLOS? IEEE TC? -- Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634 INTERNET: mark@hubcap.clemson.edu UUCP: gatech!hubcap!mark
andrew@frip.wv.tek.com (Andrew Klossner) (04/27/89)
[] "A new general-purpose computer architecture with sustained instruction pipeline performance (regardless of program flow changes) has been developed at the Digital Systems Laboratory of the Department of Electrical Engineering at Duke University ... The execution unit (EU) of the new architecture can be that of any typical von Neumann computer. The instruction fetch and decode operations are performed by multiple instruction decode units (IDU) which prefetch all potentially needed instructions ... The decoded segments are ready for execution at all times, hence there are no bubbles in the instruction pipeline, and no I-cache misses." Never an instruction miss? What happens when executing code for a switch statement, such as: load r2,jumptable(r1) jmp r2 The number of IDUs required to avoid all instruction misses would seem to be unbounded. -=- Andrew Klossner (uunet!tektronix!orca!frip!andrew) [UUCP] (andrew%frip.wv.tek.com@relay.cs.net) [ARPA]