wmb@MITCH.ENG.SUN.COM (12/06/90)
> I'd also welcome any discussion of creating "generic" > FORTH engines using FPGA's that WOULD BE affordable > on singles quantities. In calculating the cost, don't forget to amortize the up-front design and software development cost over the expected volume. Forth can be ported relatively easily compared to many other languages, but nothing is free (unless it is being subsidized by somebody or something else). Also, remember that selling small quantities is expensive in terms of support cost per unit sold. Mitch Bradley, wmb@Eng.Sun.COM
cwpjr@cbnewse.att.com (clyde.w.jr.phillips) (12/07/90)
In article <9012061501.AA20109@ucbvax.Berkeley.EDU>, wmb@MITCH.ENG.SUN.COM writes: > > I'd also welcome any discussion of creating "generic" > > FORTH engines using FPGA's that WOULD BE affordable > > on singles quantities. > > In calculating the cost, don't forget to amortize the up-front > design and software development cost over the expected volume. > > Forth can be ported relatively easily compared to many other > languages, but nothing is free (unless it is being subsidized > by somebody or something else). > > Also, remember that selling small quantities is expensive in terms of > support cost per unit sold. > > Mitch Bradley, wmb@Eng.Sun.COM Mitch, I still think you know an FPGA or simular device is essentially like a PAL or EPROM, ie A FORTH engine "design" translated by a FPGA "compiler" would allow any one of us to purchase the apporpriate commodity FPGA and with the appropriate FPGA "burner" ( like an eprom burner but really a glorified PAL programmer ) simply roll your own generic processor. Reality may demand 2 or 3 possibly different type of "PALS" to implement a usable "processor", but that will change. Since I'm speaking of this as a FIG like project, ie the FORTH engine Design being PD and generic, and each individual who wants a thingie gets a copy of the design, buys the part and gets it "blown" there is no heavy cost on any one participant, once the design work is done. I *see* this a a real possibility. I have for years but it is seeming more feasable and with the current state of affairs more desirable. Do you get my point? I'd like to debate my vision, not the standard mfg route. ( which you and I agree upon I beleive ) P.S. FPGA's will be amortized over a HUGE market so we would essentially be riding the couttails of the entire industry. Also just like FIG I could see Vendors coming along to update and customize the Design for ASIC purposes and to sell support. Let me know if you are tuned into this and want to flesh it out. ( Open Invitation One & all ) --Clyde
mef@aplcen.apl.jhu.edu (Marty Fraeman) (12/07/90)
Well, as a fellow who's actually built 3 forth chips I just couldn't keep my trap shut any longer on the economics of building processors and using FPGAs as an implementation technology. Certainly Mitch's comments regarding the economics of processor are design are valid. So to do a new Forth chip based solely on the hope that you'd make money selling the chips themselves seems pretty unlikely to me. But other factors should also be considered. For example, the argument I've been able to use around here is that we can build and program a one a kind embedded system far less expensively based on a Forth chip than with traditional ways. The language itself accounts for some of this naturally, but the chip is needed to give us acceptable performance at the same time. In fact this tradeoff is so dramatic that the processor development effort (especially with the productivity possible with modern silicon design tools) can pay for itself on the first project that uses it! Now onto FPGA built processors. There is absolutely no doubt in my mind that you could build a processor of some sort using FPGAs. I suspect you could even do something like my favorite machine (the SC32 of course -) although other approachs might better suit the building blocks in a FPGA (i.e. I suspect Phil's WISC architecture would work out pretty well). Ah, but how fast would such a machine be? Sloooooow (Well, in my opinion anyway. I suppose someone'll build it and make me a liar but until then ...). The programmable wire routing between logic elements adds lots of capacitance so lots of electrons need to wander back and forth and that all takes time. For example, how come the RTX2000, a 16 bit standard cell design, and the SC32, a 32 bit compiled chip, both run at 10 MHz? The SC32 is a 2u part and I think the RTX is too. The data path of the SC32 is essentially a custom design so wire paths are short and loading is minimal. The RTX data path is logic gates lined up in neat rows with lots of computer placed wire hooking up the gates to do things. That slows stuff up (a lot in the case of the data path). Yet the RTX standard cells are dense and close together compared to an FPGA. So I think you'd be lucky to get 1-5MHz out of an FPGA based processor. At those speeds the fancy new chips (SPARC, MIPS, AMD29000, M88000, ...) could probably run Forth almost as fast as your machine while they'd be cheaper, lower power, less board space, etc. Marty Fraeman mef@glinda.jhuapl.edu 301-953-5000, x8360 Room 13-s587 Johns Hopkins University/Applied Physics Laboratory Johns Hopkins Road Laurel, Md. 20723
adyer@milo.wyse.com (Andrew Dyer x2446) (12/11/90)
I don't think your comments are necessarily true. Several vendors have arrays with approx. 2000 2-input NAND-equivalent gates, which will run at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock cycles/instruction you get 8.33 MHz cycle rate. Not too shabby. The other problem is that FPGAs are expensive, and it would take several of them if they were the only components. For a one shot system that's o.k., but if it's to be ``public domain'' hardware, then it should be a bit simpler (IMHO). Rather than FPGA's exclusively, I would be inclined to use a mixture of LSI type parts like register files, dual port memories, ALU's and some FPGA logic for ``glue''. If one chose the correct parts, the design could be easily migrated to a standard cell or gate array library. (2900 series bit slice components, for example, are available from at least one vendor.) -- -- {uunet, mips, decwrl}!wyse!adyer or adyer@wyse.com "One day I asked the angels for inspiration, and the devil bought me a drink. He's been buying them ever since. "
cwpjr@cbnewse.att.com (clyde.w.jr.phillips) (12/11/90)
In article <ADYER.90Dec10180623@milo.wyse.com>, adyer@milo.wyse.com (Andrew Dyer x2446) writes: I think this is in responce to Mitch Brady's remarks that a FPGA engine would be SLOOOOOOW!: > I don't think your comments are necessarily true. Several vendors have > arrays with approx. 2000 2-input NAND-equivalent gates, which will run > at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates > to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock > cycles/instruction you get 8.33 MHz cycle rate. Not too shabby. > > The other problem is that FPGAs are expensive, and it would take > several of them if they were the only components. For a one shot > system that's o.k., but if it's to be ``public domain'' hardware, then > it should be a bit simpler (IMHO). > > Rather than FPGA's exclusively, I would be inclined to use a mixture > of LSI type parts like register files, dual port memories, ALU's and > some FPGA logic for ``glue''. I beleive I did reference this issue. And there are neat sequencer PALs Register file, ALU and BUS interface chips out there.... > > If one chose the correct parts, the design could be easily migrated to > a standard cell or gate array library. (2900 series bit slice > components, for example, are available from at least one vendor.) > > -- So at least two people beleive it' doable! --Clyde
mef@aplcen.apl.jhu.edu (Marty Fraeman) (12/12/90)
In article <ADYER.90Dec10180623@milo.wyse.com> adyer@milo.wyse.com (Andrew Dyer x2446) writes: >I don't think your comments are necessarily true. Several vendors have >arrays with approx. 2000 2-input NAND-equivalent gates, which will run Well lets see now. Both the SC32 and RTX2000 family basically have three separate address spaces that can be accessed each cycle: main memory for instructions and data, parameter stack memory, and data stack memory. My belief is that this is the key feature needed to make a high speed Forth engine. Both Koopman and Hayes have shown that the stack memories should be at least 16 words deep before overflow mechanism overhead becomes negligible. So a 16 bit machine should have at least 2*16*16 bits of memory tightly coupled to the CPU. A single bit of memory takes at least 2 2-input NAND gates, about 1K gates total, just for the stacks or at least half of your FPGA. If you take the stacks off the FPGA and put them in static ram like the Novix chip did then you take a big speed hit. For proof look at the top speed of the Novix vs the RTX2000. Both were using around 2u technology (although the Novix was a gate array and the RTX is a standard cell) yet the the RTX is more than twice as fast. >at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates >to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock >cycles/instruction you get 8.33 MHz cycle rate. Not too shabby. > Yes, for one flip flop maybe, but what happens when you finish routing a real circuit? >The other problem is that FPGAs are expensive, and it would take >several of them if they were the only components. For a one shot >system that's o.k., but if it's to be ``public domain'' hardware, then >it should be a bit simpler (IMHO). > >Rather than FPGA's exclusively, I would be inclined to use a mixture >of LSI type parts like register files, dual port memories, ALU's and >some FPGA logic for ``glue''. > >If one chose the correct parts, the design could be easily migrated to >a standard cell or gate array library. (2900 series bit slice >components, for example, are available from at least one vendor.) Yes you could do this and Phil Koopman already did. In fact Phil migrated his WISC 32 from TTL to a standard cell design while at Harris. Perhaps he could comment on performance of the discrete vs integrated implementation. Marty Fraeman mef@aplcen.apl.jhu.edu 301-953-5000, x8360 Room 13-s587 Johns Hopkins University/Applied Physics Laboratory Johns Hopkins Road Laurel, Md. 20723
adyer@milo.wyse.com (Andrew Dyer x2446) (12/14/90)
In article <1990Dec11.181204.10500@aplcen.apl.jhu.edu> mef@aplcen.apl.jhu.edu (Marty Fraeman) writes: (mucho stuff deleted) the stacks off the FPGA and put them in static ram like the Novix chip did then you take a big speed hit. For proof look at the top speed of the Novix vs the RTX2000. I am not framiliar with the Novix implementation, but I would think that some of the cache RAMs available today could be used for this function. >at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates >to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock >cycles/instruction you get 8.33 MHz cycle rate. Not too shabby. > Yes, for one flip flop maybe, but what happens when you finish routing a real circuit? As long as every thing is synchronous things shouldn't be too bad. We have done a couple tests of these FPGAs at close to 50MHz. Routing was important, and so was on/off chip delay, but after mucking about some we got them to simulate o.k. Admittedly it was never made, but I believe their simulator was reasonably accurate. (small edit here) Perhaps he (Phil Koop) could comment on performance of the discrete vs integrated implementation. I wouldn't expect a discrete to be able to do more than 75% of what the integrated version would, but I was thinking of ``public domain'' hardware and not a release of a commercial product. -- {uunet, mips, decwrl}!wyse!adyer or adyer@wyse.com " I think I woke up on the wrong side of the food chain today..."