[net.arch] Novix Forth chip seen at Rochester Conference

larryh@tekcbi.UUCP (Larry Hutchinson) (06/17/85)

Last Friday, I had a chance to play with a (physically) small computer
built around the Novix Forth chip.  The system had condensed from the
vapor state only a few days before but was running remarkably well for a
first pass (and very few green wires too!).  The chip, for those who don't
know, is a CMOS gate-array (4000 gates, I believe) designed by Forth founder
Chuck Moore and the folks at Novix inc. to directly execute 16 bit Forth at
a rate of approximately 10 Million Forth primitives per second (at a clock
rate of 8 MHz).  See the cover story of Electronic Design, Mar 21, 1985 for
more details.

About the only thing that I could think of doing during the few minutes that
I had with the machine was to run the ol' Fibbonachi (sp???) series.  As I
had just come from a banquet and was full of both food and spirits, the
following must be taken with a killogram of NaCl.  (BTW, I believe that the
machine was not running at the full 8 MHz -- 6 MHz I think.)  The results
sort of seem to possibly indicate that the Novix board runs FIB(20) about
40 times as fast as my own 16 bit 68000 Forth running on my own rather slow
68000 system (8 MHz, 1 wait state for read, 2 for write -- I've gotta fix
that!) and about 7 times as fast as the same system running assembly language.
A modern 12 MHz 68010 no wait state machine would probably run twice as fast.
MacFORTH (16 bit token, 32 bit arithmetic) runs about half as fast as my Forth.

Perhaps someone with a better memory and/or more time with the machine should
give their impressions.

Details of little or no interrest:

    The algorithm for the Fibbonachi series is as follows (I hope!):
        FIB(n)= 1  if n= 0 or 1
        FIB(n)= FIB(n-2)+FIB(n-1)  otherwise
  ( BTW,  FIB(20)= 10946. )

The Forth source follows:

    :
FIB     ( n ][ FIB[n]  -- calculates an element from the Fibbonachi series)
    RECURSIVE
    DUP 2 < IF
        DROP 1                  ( FIB[0 or 1]= 1 )
    ELSE
        DUP 2 - FIB             ( FIB[n-2]  )
        SWAP 1 - FIB            ( FIB[n-1]  )
        +                       ( FIB[n]= FIB[n-2] + FIB[n-1] )
    THEN ;

The assembly language version:
(my own syntax, but you should be able to figure it out.  Also, I make no
claims that the following is optimal. )

    CREATE
(FIB)                           ( just a label - the recursive part )
    CMP S ) TO D7  < IF         ( d7=2.  S= parameter stack.  FIB[0 or1]? )
        MOVE D6 TO S )          ( d6=1.  FIB[0 or 1] = 1 )
        RTS                     ( done )
    THEN                        ( here if n > 1] )
    MOVE S ) TO S -)            ( copy of n )
    SUBQ 2 FROM S )             ( n-2 )
    BSR (FIB)                   ( cacl FIB[n-2]  )
    MOVE 2 S X) TO D0           ( get n )
    MOVE S )+ TO S )            ( save FIB[n-2] )
    SUBQ 1 FROM D0              ( make n-1 )
    MOVE D0 TO S -)
    BSR (FIB)                   ( calc FIB[n-1] )
    MOVE S )+ TO D0
    ADD D0 TO S )               ( return FIB[n-1]+FIB[n-2] on param stk )
    RTS

    CODE
FIB         ( n ][ FIB[n]  -- as above but assembler version )
    MOVEQ 2 TO D7               ( usefull constant )
    MOVEQ 1 TO D6               ( ditto )
    BSR (FIB)                   ( go do the real work )
    NEXT

Timings for FIB(20):
    Novix machine             49 ms
    Assembler, home machine  338 ms
    Forth, home machine     1800 ms
    Mac (MacForth)          3600 ms

Novix address:  10590 N. Tantau Ave., Cupertino, CA 95014,  408-996-9363

Disclamer:  The usual plus "yes I know benchmarks are meaningless, especially
this one".

Larry Hutchinson, Tektronix, Inc. PO Box 500, MS Y6-546, Beaverton, OR 97077
{ decvax,allegra }!tektronix!tekcbi!larryh
-- 
Larry Hutchinson, Tektronix, Inc. PO Box 500, MS Y6-546, Beaverton, OR 97077
{ decvax,allegra }!tektronix!tekcbi!larryh

wolpert@hpisla.UUCP (David Wolpert) (06/19/85)

# Written  8:30 am  Jun 17, 1985 by larryh@tekcbi.UUCP in net.arch

>...at
>a rate of approximately 10 Million Forth primitives per second (at a clock
>rate of 8 MHz).

  Amazing!
  (a.k.a. Unbelievable!)

>...the
>following must be taken with a killogram of NaCl....

  So, I think, should the preceding.
  Unless I am missing something, even an *extremely* complex processor
  could execute only 8 "Million Forth primitives per second (at a clock
  rate of 8 MHz)."

David Wolpert
              Hewlett-Packard Company,  Instrument Systems Lab
              P O Box 301 - Loveland, Colorado - 80539 (USMail)
              {*!}{hp*!}hpisla!wolpert (un*x)
*** RAW BITS: Not For Everybody ***

steve@kontron.UUCP (Steve McIntosh) (06/20/85)

> >...the
> >following must be taken with a killogram of NaCl....
> 
>   So, I think, should the preceding.
>   Unless I am missing something, even an *extremely* complex processor
>   could execute only 8 "Million Forth primitives per second (at a clock
>   rate of 8 MHz)."
> 
> David Wolpert

David - the FORTH chip is really a RISC machine using the Forth 
virtual machine as a model. It can execute more primatives per
second than clocks per second because the 16 bit opcodes (executed
in one clock cycle) can specify several primatives in parallel.
As an example, the forth phrase DUP @ SWAP nn + which is useful
for marching thru a data structure is one opcode on the Forth chip.
 
Much of the reason that this can be done is that the chip accesses
main memory, the data stack and the return stack in parallel. It can
access 48 bits of data on each clock cycle as well as the top two
items of the data stack, which are kept in registers.

It is possible that this chip may show up as the core of a single
user workstation, but its main market is for use as an embedded
control processor. It will be some time before it is seen in
"general purpose" computers, if ever. (more's the pity.)

john@frog.UUCP (John Woods) (06/24/85)

> >a rate of approximately 10 Million Forth primitives per second (at a clock
> >rate of 8 MHz).
>   Amazing!
>   (a.k.a. Unbelievable!)
>   Unless I am missing something, even an *extremely* complex processor
>   could execute only 8 "Million Forth primitives per second (at a clock
>   rate of 8 MHz)."

I have the article on the Forth chip from Electronic Design, 21 March 1985
(DROOL DROOL DROOL!!!!!!!).  The key to how it works is in the following
statement, quoted from said article:

"The first chip to be released, the NC4000A, runs at a clock speed of 8MHz.
Each instruction executes in a single clock cycle, and performs as many as
five operations simultaneously -- for a chip speed of over 10 million
operations per second."

Most FORTH operations require most of the available chip operations, per
cycle, but several FORTH operations don't, and can be easily combined with
idiomatically subsequent FORTH operations for simultaneous execution, e.g.
operation pairs like @ +, OVER SWAP -, or the amazing DUP @ SWAP nn +
(an incrementing fetch, basically).

The chip (from the article) basically looks rather like a bitslice processor
which you get to write the microcode for (rather than it interpreting macro-
code).  The microcode is specially designed to be graceful for much of FORTH.
It also have 5 busses, two I/O busses, a "data stack" bus, a "return stack"
bus, and a "main memory" bus.  Implementing C on it looks like it might be
tricky, but it is clearly the ideal FORTH chip.

--
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA

The State Department is paying me to post this message, but if I am caught,
they will disavow all knowledge of my actions.