[comp.lang.forth] SC32 performance and availability

mef@aplcen.apl.jhu.edu (Marty Fraeman) (11/16/89)

Recently there have been some some postings (which of
course have since expired on my machine) about the SC32
to this group that have kind of bothered me.  I'm hardly 
unbiased about the SC32 since I am one of the designers 
of the chip but on the other hand I do have accurate information
about its performance and availability.

The SC32 instruction set is similar in flavor to the
Novix (and therefore Harris RTX 2000 family) processor.  
However, there are significant improvements in comparisons, 
branching, memory access operations and the way stacks are 
supported.  Multiple Forth primitives can often be compacted 
into a single machine instruction.  So to first order the 
speed of the SC32 should be a little better than a Novix 
style chip running at the same instruction execution rate.
I.e. the commercially available 10 MHz SC32 chips will eat
up Forth code at least as fast as a Harris 10 MHz RTX 2001.
Note that I said RTX 2001 as the SC32 does not have on-chip
hardware multiplier although it can perform a 16x16 multiply
in as few as 21 cycles.  Also please keep in mind that the 
RTX parts need a 20 MHz clock to execute instructions at a 
10 MHz rate.  Both machines need similar speed program memory
to run at this speed.

Estimating the performance of chips under development is
of course not likely to be very accurate.  But based on
the talk given by Phil Koopman at the Rochester Forth
conference two years ago and by Rick Van Norman at the
conference this year, the architecture of the RTX 4000 
is significantly different from either the RTX 2000 or the
SC32.  A comparison of the two approaches to supporting
Forth in hardware would certainly be an interesting study
but I don't believe the outcome is at all a forgone conclusion.
Perhaps Phil could enlighten us.

Now some words about where and how the SC32 was developed
and how that impacts chip availability for high volume usage.
The SC32 was developed at the Johns Hopkins University Applied
Physics Laboratory to support embedded computer applications.  
We are using the part on numerous projects including a satellite 
instrument controller.  JHU/APL is over 45 years old and is one 
of the largest university run research labs in the country.
We developed the chip using a silicon compiler and can easily
recompile the design to over 20 different fab lines from a
wide variety of silicon foundries with feature sizes as fine
as 1 micron.  A new release of this CAD software now in beta
test supports submicron features.  The current 10 MHz implementation
of the SC32 was built with a 2 micron process.  I have already
recompiled the SC32 design targeted to a 1 micron technology and
the part's speed should more than double.

What does this all mean about availability of the SC32 to
Silicon Composers customers?  First off, the fab house that
was used for the current version of the SC32 uses a direct
e-beam write on wafer process that is ideal for low volume
production.  If high volume production becomes necessary then
the current fab house can easily transfer the chip design to an 
traditional high volume foundry that has an identical process.
Of course we could also retarget the design using our silicon 
compiler for different high volume line.  So clearly the SC32
could easily go into very high volume production if such demands
arise.  In the mean time, low volume application can still be 
cost effectively satisfied through the current arrangement.

In order to make the SC32 widely available, APL has granted 
an exclusive license to Silicon Composers for commercial 
applications.  Silicon Composers currently obtains their chips
from the same foundry APL used.  Silicon Composers also performs 
testing and burn-in on the chips and they have developed an 
IBM-PC SC32 co-processor card with support software.  My understanding
is that Silicon Composers is willing to sublicense to the
design to high volume customers.  Such a sublicense will allow
those high volume users to directly negotiate costs (potentially
with all the foundries supported by the silicon compiler).
This competition has the potential to greatly lower costs for
the high volume user when compared to buying chips from a single 
source (for example Intel 80386 or Harris RTX2000).  Finally,
remember that this is an APL design.  Should Silicon Composers 
not make it (hevean forbid -) rights to the chip can still be
obtained from APL.  In short, I feel the high volume user (and
probably even the middle volume customer) can feel confident
that they can get the SC32 for a long time to come.

For low volume users (like me for example), the biggest advantage 
of the SC32 is that it availble NOW!!  Call Silicon Composers and
you too can have one on your desk within a week.  


	Marty Fraeman

	mef@aplcen.apl.jhu.edu
	301-953-5000, x8360

	JHU/Applied Physics Laboratory
	Johns Hopkins Road
	Laurel, Md. 20707

koopman@a.gp.cs.cmu.edu (Philip Koopman) (11/16/89)

In article <3891@aplcen.apl.jhu.edu>, mef@aplcen.apl.jhu.edu (Marty Fraeman) writes:
 
> From: mef@aplcen.apl.jhu.edu (Marty Fraeman)
> Subject: SC32 performance and availability
 
> Estimating the performance of chips under development is
> of course not likely to be very accurate.  But based on
> the talk given by Phil Koopman at the Rochester Forth
> conference two years ago and by Rick Van Norman at the
> conference this year, the architecture of the RTX 4000 
> is significantly different from either the RTX 2000 or the
> SC32.  A comparison of the two approaches to supporting
> Forth in hardware would certainly be an interesting study
> but I don't believe the outcome is at all a forgone conclusion.
> Perhaps Phil could enlighten us.
 
A comparison between the RTX 32P and the RTX 2000 showed
that they took about the same number of clock cycles to execute a mix
of Forth instructions.  The minimum time to execute an instruction
on the RTX 32P was 2 clocks, but it made up for this by supporting
higher-level instructions (e.g. ROT and 2OVER ) and by combining
subroutine calls with opcodes "for free".  The RTX 4000 takes
fewer clock cycles for the average instruction than the RTX 32P.
It would be premature to claim that this does more than
equal the increase in power of the SC32 over the RTX 2000.
So, I agree that the jury is still out.
 
BUT, number of clock cycles is not the entire issue.  For embedded
real time control, memory chip speed is usually a consideration
because of a combination of cost, power/cooling, and size concerns.
The 32-bit RTX series uses 2 clocks per memory cycle
instead of 1 clock per memory cycle.  That means that if the limiting
factor in your system is memory chip speed, you get at least twice
the clock frequency with the 32-bit RTX family than with the RTX 2000 or
SC32.  The RTX 4000 is being optimized for total system solution
effectiveness, *not* raw speed at any cost.
 
  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  2525A Wexford Run Rd.
  Wexford, PA  15090
Senior Scientist at Harris Semiconductor.
I don't speak for them, and they don't speak for me.