[comp.lang.forth] FPGA Forth engines

wmb@MITCH.ENG.SUN.COM (12/06/90)

> I'd also welcome any discussion of creating "generic"
> FORTH engines using FPGA's that WOULD BE affordable
> on singles quantities.

In calculating the cost, don't forget to amortize the up-front
design and software development cost over the expected volume.

Forth can be ported relatively easily compared to many other
languages, but nothing is free (unless it is being subsidized
by somebody or something else).

Also, remember that selling small quantities is expensive in terms of
support cost per unit sold.

Mitch Bradley, wmb@Eng.Sun.COM

cwpjr@cbnewse.att.com (clyde.w.jr.phillips) (12/07/90)

In article <9012061501.AA20109@ucbvax.Berkeley.EDU>, wmb@MITCH.ENG.SUN.COM writes:
> > I'd also welcome any discussion of creating "generic"
> > FORTH engines using FPGA's that WOULD BE affordable
> > on singles quantities.
> 
> In calculating the cost, don't forget to amortize the up-front
> design and software development cost over the expected volume.
> 
> Forth can be ported relatively easily compared to many other
> languages, but nothing is free (unless it is being subsidized
> by somebody or something else).
> 
> Also, remember that selling small quantities is expensive in terms of
> support cost per unit sold.
> 
> Mitch Bradley, wmb@Eng.Sun.COM

Mitch,
	I still think you know an FPGA or simular device is essentially
like a PAL or EPROM, ie

A FORTH engine "design" translated by a FPGA "compiler"
would allow any one of us to purchase the apporpriate commodity FPGA
and with the appropriate FPGA "burner" ( like an eprom burner
but really a glorified PAL programmer ) simply roll your own
generic processor.

Reality may demand 2 or 3 possibly different type of "PALS"
to implement a usable "processor", but that will change.

Since I'm speaking of this as a FIG like project,
ie the FORTH engine Design being PD and generic,
and each individual who wants a thingie gets a copy 
of the design, buys the part and gets it "blown"
there is no heavy cost on any one participant,
once the design work is done.

I *see* this a a real possibility. I have for years but
it is seeming more feasable and with the current state of affairs
more desirable.

Do you get my point? I'd like to debate my vision,
not the standard mfg route. ( which you and I agree upon I beleive )

P.S. FPGA's will be amortized over a HUGE market so we would essentially
be riding the couttails of the entire industry.
Also just like FIG I could see Vendors coming along to update
and customize the Design for ASIC purposes and to sell support.

Let me know if you are tuned into this and want to flesh it out.
( Open Invitation One & all )
--Clyde

mef@aplcen.apl.jhu.edu (Marty Fraeman) (12/07/90)

Well, as a fellow who's actually built 3 forth chips I just couldn't
keep my trap shut any longer on the economics of building processors
and using FPGAs as an implementation technology.

Certainly Mitch's comments regarding the economics of processor are
design are valid.  So to do a new Forth chip based solely on the hope
that you'd make money selling the chips themselves seems pretty
unlikely to me.  But other factors should also be considered.  For
example, the argument I've been able to use around here is that we can
build and program a one a kind embedded system far less expensively
based on a Forth chip than with traditional ways.  The language itself
accounts for some of this naturally, but the chip is needed to give us
acceptable performance at the same time.  In fact this tradeoff is so
dramatic that the processor development effort (especially with the
productivity possible with modern silicon design tools) can pay for
itself on the first project that uses it!

Now onto FPGA built processors.  There is absolutely no doubt in my
mind that you could build a processor of some sort using FPGAs.  I
suspect you could even do something like my favorite machine (the SC32
of course -) although other approachs might better suit the building
blocks in a FPGA (i.e. I suspect Phil's WISC architecture would work
out pretty well).  Ah, but how fast would such a machine be?  Sloooooow
(Well, in my opinion anyway.  I suppose someone'll build it and make me
a liar but until then ...).  The programmable wire routing between
logic elements adds lots of capacitance so lots of electrons need to
wander back and forth and that all takes time.  For example, how come
the RTX2000, a 16 bit standard cell design, and the SC32, a 32 bit
compiled chip, both run at 10 MHz?  The SC32 is a 2u part and I think
the RTX is too.  The data path of the SC32 is essentially a custom
design so wire paths are short and loading is minimal.  The RTX data
path is logic gates lined up in neat rows with lots of computer placed
wire hooking up the gates to do things.  That slows stuff up (a lot in
the case of the data path).  Yet the RTX standard cells are dense and
close together compared to an FPGA.  So I think you'd be lucky to get
1-5MHz out of an FPGA based processor.  At those speeds the fancy new
chips (SPARC, MIPS, AMD29000, M88000, ...) could probably run Forth
almost as fast as your machine while they'd be cheaper, lower power,
less board space, etc.



	Marty Fraeman

	mef@glinda.jhuapl.edu
	301-953-5000, x8360

	Room 13-s587
	Johns Hopkins University/Applied Physics Laboratory
	Johns Hopkins Road
	Laurel, Md. 20723

adyer@milo.wyse.com (Andrew Dyer x2446) (12/11/90)

I don't think your comments are necessarily true. Several vendors have
arrays with approx. 2000 2-input NAND-equivalent gates, which will run
at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates
to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock
cycles/instruction you get 8.33 MHz cycle rate.  Not too shabby.

The other problem is that FPGAs are expensive, and it would take
several of them if they were the only components. For a one shot
system that's o.k., but if it's to be ``public domain'' hardware, then
it should be a bit simpler (IMHO).

Rather than FPGA's exclusively, I would be inclined to use a mixture
of LSI type parts like register files, dual port memories, ALU's and
some FPGA logic for ``glue''.

If one chose the correct parts, the design could be easily migrated to
a standard cell or gate array library. (2900 series bit slice
components, for example, are available from at least one vendor.)

--
--
{uunet, mips, decwrl}!wyse!adyer or adyer@wyse.com
"One day I asked the angels for inspiration, and the devil bought me a drink.
 He's been buying them ever since. "

cwpjr@cbnewse.att.com (clyde.w.jr.phillips) (12/11/90)

In article <ADYER.90Dec10180623@milo.wyse.com>, adyer@milo.wyse.com (Andrew Dyer x2446) writes:
I think this is in responce to Mitch Brady's remarks that a FPGA engine
would be SLOOOOOOW!:

> I don't think your comments are necessarily true. Several vendors have
> arrays with approx. 2000 2-input NAND-equivalent gates, which will run
> at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates
> to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock
> cycles/instruction you get 8.33 MHz cycle rate.  Not too shabby.
> 
> The other problem is that FPGAs are expensive, and it would take
> several of them if they were the only components. For a one shot
> system that's o.k., but if it's to be ``public domain'' hardware, then
> it should be a bit simpler (IMHO).
> 
> Rather than FPGA's exclusively, I would be inclined to use a mixture
> of LSI type parts like register files, dual port memories, ALU's and
> some FPGA logic for ``glue''.

I beleive I did reference this issue. And there are neat sequencer PALs
Register file, ALU and BUS interface chips out there....
> 
> If one chose the correct parts, the design could be easily migrated to
> a standard cell or gate array library. (2900 series bit slice
> components, for example, are available from at least one vendor.)
> 
> --

So at least two people beleive it' doable!
--Clyde

mef@aplcen.apl.jhu.edu (Marty Fraeman) (12/12/90)

In article <ADYER.90Dec10180623@milo.wyse.com> adyer@milo.wyse.com (Andrew Dyer x2446) writes:
>I don't think your comments are necessarily true. Several vendors have
>arrays with approx. 2000 2-input NAND-equivalent gates, which will run

Well lets see now.  Both the SC32 and RTX2000 family basically have
three separate address spaces that can be accessed each cycle:  main
memory for instructions and data, parameter stack memory, and data
stack memory.  My belief is that this is the key feature needed to make
a high speed Forth engine.  Both Koopman and Hayes have shown that the
stack memories should be at least 16 words deep before overflow
mechanism overhead becomes negligible.  So a 16 bit machine should have
at least 2*16*16 bits of memory tightly coupled to the CPU.  A single
bit of memory takes at least 2 2-input NAND gates, about 1K gates
total, just for the stacks or at least half of your FPGA.  If you take
the stacks off the FPGA and put them in static ram like the Novix chip
did then you take a big speed hit.  For proof look at the top speed of
the Novix vs the RTX2000.  Both were using around 2u technology
(although the Novix was a gate array and the RTX is a standard cell)
yet the the RTX is more than twice as fast.

>at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates
>to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock
>cycles/instruction you get 8.33 MHz cycle rate.  Not too shabby.
>
Yes, for one flip flop maybe, but what happens when you finish routing
a real circuit?

>The other problem is that FPGAs are expensive, and it would take
>several of them if they were the only components. For a one shot
>system that's o.k., but if it's to be ``public domain'' hardware, then
>it should be a bit simpler (IMHO).
>
>Rather than FPGA's exclusively, I would be inclined to use a mixture
>of LSI type parts like register files, dual port memories, ALU's and
>some FPGA logic for ``glue''.
>
>If one chose the correct parts, the design could be easily migrated to
>a standard cell or gate array library. (2900 series bit slice
>components, for example, are available from at least one vendor.)
Yes you could do this and Phil Koopman already did.  In fact Phil
migrated his WISC 32 from TTL to a standard cell design while at Harris.
Perhaps he could comment on performance of the discrete vs integrated
implementation.

	Marty Fraeman

	mef@aplcen.apl.jhu.edu
	301-953-5000, x8360

	Room 13-s587
	Johns Hopkins University/Applied Physics Laboratory
	Johns Hopkins Road
	Laurel, Md. 20723

adyer@milo.wyse.com (Andrew Dyer x2446) (12/14/90)

In article <1990Dec11.181204.10500@aplcen.apl.jhu.edu> mef@aplcen.apl.jhu.edu (Marty Fraeman) writes:

(mucho stuff deleted)
   the stacks off the FPGA and put them in static ram like the Novix chip
   did then you take a big speed hit.  For proof look at the top speed of
   the Novix vs the RTX2000.

I am not framiliar with the Novix implementation, but I would think
that some of the cache RAMs available today could be used for this
function.

   >at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates
   >to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock
   >cycles/instruction you get 8.33 MHz cycle rate.  Not too shabby.
   >
   Yes, for one flip flop maybe, but what happens when you finish routing
   a real circuit?

As long as every thing is synchronous things shouldn't be too bad.
We have done a couple tests of these FPGAs at close to 50MHz. Routing
was important, and so was on/off chip delay, but after mucking about
some we got them to simulate o.k. Admittedly it was never made, but
I believe their simulator was reasonably accurate.

(small edit here)
   Perhaps he (Phil Koop) could comment on performance of the discrete
   vs integrated implementation.

I wouldn't expect a discrete to be able to do more than 75% of
what the integrated version would, but I was thinking of ``public
domain'' hardware and not a release of a commercial product.

--
{uunet, mips, decwrl}!wyse!adyer or adyer@wyse.com
" I think I woke up on the wrong side of the food chain today..."