larryh@tekcbi.UUCP (Larry Hutchinson) (06/17/85)
Last Friday, I had a chance to play with a (physically) small computer built around the Novix Forth chip. The system had condensed from the vapor state only a few days before but was running remarkably well for a first pass (and very few green wires too!). The chip, for those who don't know, is a CMOS gate-array (4000 gates, I believe) designed by Forth founder Chuck Moore and the folks at Novix inc. to directly execute 16 bit Forth at a rate of approximately 10 Million Forth primitives per second (at a clock rate of 8 MHz). See the cover story of Electronic Design, Mar 21, 1985 for more details. About the only thing that I could think of doing during the few minutes that I had with the machine was to run the ol' Fibbonachi (sp???) series. As I had just come from a banquet and was full of both food and spirits, the following must be taken with a killogram of NaCl. (BTW, I believe that the machine was not running at the full 8 MHz -- 6 MHz I think.) The results sort of seem to possibly indicate that the Novix board runs FIB(20) about 40 times as fast as my own 16 bit 68000 Forth running on my own rather slow 68000 system (8 MHz, 1 wait state for read, 2 for write -- I've gotta fix that!) and about 7 times as fast as the same system running assembly language. A modern 12 MHz 68010 no wait state machine would probably run twice as fast. MacFORTH (16 bit token, 32 bit arithmetic) runs about half as fast as my Forth. Perhaps someone with a better memory and/or more time with the machine should give their impressions. Details of little or no interrest: The algorithm for the Fibbonachi series is as follows (I hope!): FIB(n)= 1 if n= 0 or 1 FIB(n)= FIB(n-2)+FIB(n-1) otherwise ( BTW, FIB(20)= 10946. ) The Forth source follows: : FIB ( n ][ FIB[n] -- calculates an element from the Fibbonachi series) RECURSIVE DUP 2 < IF DROP 1 ( FIB[0 or 1]= 1 ) ELSE DUP 2 - FIB ( FIB[n-2] ) SWAP 1 - FIB ( FIB[n-1] ) + ( FIB[n]= FIB[n-2] + FIB[n-1] ) THEN ; The assembly language version: (my own syntax, but you should be able to figure it out. Also, I make no claims that the following is optimal. ) CREATE (FIB) ( just a label - the recursive part ) CMP S ) TO D7 < IF ( d7=2. S= parameter stack. FIB[0 or1]? ) MOVE D6 TO S ) ( d6=1. FIB[0 or 1] = 1 ) RTS ( done ) THEN ( here if n > 1] ) MOVE S ) TO S -) ( copy of n ) SUBQ 2 FROM S ) ( n-2 ) BSR (FIB) ( cacl FIB[n-2] ) MOVE 2 S X) TO D0 ( get n ) MOVE S )+ TO S ) ( save FIB[n-2] ) SUBQ 1 FROM D0 ( make n-1 ) MOVE D0 TO S -) BSR (FIB) ( calc FIB[n-1] ) MOVE S )+ TO D0 ADD D0 TO S ) ( return FIB[n-1]+FIB[n-2] on param stk ) RTS CODE FIB ( n ][ FIB[n] -- as above but assembler version ) MOVEQ 2 TO D7 ( usefull constant ) MOVEQ 1 TO D6 ( ditto ) BSR (FIB) ( go do the real work ) NEXT Timings for FIB(20): Novix machine 49 ms Assembler, home machine 338 ms Forth, home machine 1800 ms Mac (MacForth) 3600 ms Novix address: 10590 N. Tantau Ave., Cupertino, CA 95014, 408-996-9363 Disclamer: The usual plus "yes I know benchmarks are meaningless, especially this one". Larry Hutchinson, Tektronix, Inc. PO Box 500, MS Y6-546, Beaverton, OR 97077 { decvax,allegra }!tektronix!tekcbi!larryh -- Larry Hutchinson, Tektronix, Inc. PO Box 500, MS Y6-546, Beaverton, OR 97077 { decvax,allegra }!tektronix!tekcbi!larryh
wolpert@hpisla.UUCP (David Wolpert) (06/19/85)
# Written 8:30 am Jun 17, 1985 by larryh@tekcbi.UUCP in net.arch >...at >a rate of approximately 10 Million Forth primitives per second (at a clock >rate of 8 MHz). Amazing! (a.k.a. Unbelievable!) >...the >following must be taken with a killogram of NaCl.... So, I think, should the preceding. Unless I am missing something, even an *extremely* complex processor could execute only 8 "Million Forth primitives per second (at a clock rate of 8 MHz)." David Wolpert Hewlett-Packard Company, Instrument Systems Lab P O Box 301 - Loveland, Colorado - 80539 (USMail) {*!}{hp*!}hpisla!wolpert (un*x) *** RAW BITS: Not For Everybody ***
steve@kontron.UUCP (Steve McIntosh) (06/20/85)
> >...the > >following must be taken with a killogram of NaCl.... > > So, I think, should the preceding. > Unless I am missing something, even an *extremely* complex processor > could execute only 8 "Million Forth primitives per second (at a clock > rate of 8 MHz)." > > David Wolpert David - the FORTH chip is really a RISC machine using the Forth virtual machine as a model. It can execute more primatives per second than clocks per second because the 16 bit opcodes (executed in one clock cycle) can specify several primatives in parallel. As an example, the forth phrase DUP @ SWAP nn + which is useful for marching thru a data structure is one opcode on the Forth chip. Much of the reason that this can be done is that the chip accesses main memory, the data stack and the return stack in parallel. It can access 48 bits of data on each clock cycle as well as the top two items of the data stack, which are kept in registers. It is possible that this chip may show up as the core of a single user workstation, but its main market is for use as an embedded control processor. It will be some time before it is seen in "general purpose" computers, if ever. (more's the pity.)
john@frog.UUCP (John Woods) (06/24/85)
> >a rate of approximately 10 Million Forth primitives per second (at a clock > >rate of 8 MHz). > Amazing! > (a.k.a. Unbelievable!) > Unless I am missing something, even an *extremely* complex processor > could execute only 8 "Million Forth primitives per second (at a clock > rate of 8 MHz)." I have the article on the Forth chip from Electronic Design, 21 March 1985 (DROOL DROOL DROOL!!!!!!!). The key to how it works is in the following statement, quoted from said article: "The first chip to be released, the NC4000A, runs at a clock speed of 8MHz. Each instruction executes in a single clock cycle, and performs as many as five operations simultaneously -- for a chip speed of over 10 million operations per second." Most FORTH operations require most of the available chip operations, per cycle, but several FORTH operations don't, and can be easily combined with idiomatically subsequent FORTH operations for simultaneous execution, e.g. operation pairs like @ +, OVER SWAP -, or the amazing DUP @ SWAP nn + (an incrementing fetch, basically). The chip (from the article) basically looks rather like a bitslice processor which you get to write the microcode for (rather than it interpreting macro- code). The microcode is specially designed to be graceful for much of FORTH. It also have 5 busses, two I/O busses, a "data stack" bus, a "return stack" bus, and a "main memory" bus. Implementing C on it looks like it might be tricky, but it is clearly the ideal FORTH chip. -- John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101 ...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA The State Department is paying me to post this message, but if I am caught, they will disavow all knowledge of my actions.