[comp.lang.forth] FPGA stack engines

koopman@a.gp.cs.cmu.edu (Philip Koopman) (12/14/90)

 
>         Marty Fraeman writes:
> >Rather than FPGA's exclusively, I would be inclined to use a mixture
> >of LSI type parts like register files, dual port memories, ALU's and
> >some FPGA logic for ``glue''.
> >
> >If one chose the correct parts, the design could be easily migrated to
> >a standard cell or gate array library. (2900 series bit slice
> >components, for example, are available from at least one vendor.)
> Yes you could do this and Phil Koopman already did.  In fact Phil
> migrated his WISC 32 from TTL to a standard cell design while at Harris.
> Perhaps he could comment on performance of the discrete vs integrated
> implementation.

The WISC CPU/16 and CPU/32 were built using a bit-slice approach with
discrete TTL components.  I judged that AMD 2901/2903's were too expensive,
finicky to work with, and just plain overkill.  In particular, the
on-chip register file just didn't meet my requirements.  So, I used
74181/182 ALU slices and 74374 register chips.  The TTL version with
ALS technology ran at 6 MHZ for the 32-bit system.
 
Porting the design onto 2.5 micron standard cell CMOS at Harris resulting
in about 8 MHz operation for the RTX 32P.  It would have been faster but
for two reasons:
  1) the design still used tristate logic, which is good in discrete TTL
     and slow in VLSI CMOS (muxes are often faster, and increased package
     count isn't an issue)
  2) the design was partitioned across two chips, with an inter-chip bus.
 
A major overhaul of the design to take into account good CMOS design
practice resulted in the BINAR chip.  This 2.0 micron standard
cell CMOS chip ran at between 12 and 16 MHz depending on the wafer.
It was single-chip, used muxes instead of tristate buses, and somewhat
tuned for speed (I'll bet we could have gotten to 20 MHz typical with
further careful tuning).
 
My estimate based on cursory analysis is that an FPGA design is going
to be 2x to 5x slower than a gate array/standard cell design.  One reason
is that most FPGA's aren't architected for CPU design (they are better
at glue logic consolidation).  Another is that there is a tremendous
amount of interconnect capacitance that slows things down.
 
FPGA's with sea-of-gates architectures in the 10,000-gate range
are just making it to market. Those ought to be quite interesting,
and Charlie Johnsen at MISC is banking on them to get his MISC CPU built.
(He told me he expects to take a big speed hit for using FPGA's,
but he is using the flexibility to reconfigure the CPU on the fly
to get the speed back with application-specific instruction sets.)
Charlie's design shares program memory with stack memory in the same
RAM chips.  It might be possible to have separate stack memories
if FPGA's with on-chip memory come out (a likely possibility
in the coming years).
 
  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  2525A Wexford Run Rd.
  Wexford, PA  15090
*** this space for rent ***