mef@aplcen.apl.jhu.edu (Marty Fraeman) (11/16/89)
Recently there have been some some postings (which of course have since expired on my machine) about the SC32 to this group that have kind of bothered me. I'm hardly unbiased about the SC32 since I am one of the designers of the chip but on the other hand I do have accurate information about its performance and availability. The SC32 instruction set is similar in flavor to the Novix (and therefore Harris RTX 2000 family) processor. However, there are significant improvements in comparisons, branching, memory access operations and the way stacks are supported. Multiple Forth primitives can often be compacted into a single machine instruction. So to first order the speed of the SC32 should be a little better than a Novix style chip running at the same instruction execution rate. I.e. the commercially available 10 MHz SC32 chips will eat up Forth code at least as fast as a Harris 10 MHz RTX 2001. Note that I said RTX 2001 as the SC32 does not have on-chip hardware multiplier although it can perform a 16x16 multiply in as few as 21 cycles. Also please keep in mind that the RTX parts need a 20 MHz clock to execute instructions at a 10 MHz rate. Both machines need similar speed program memory to run at this speed. Estimating the performance of chips under development is of course not likely to be very accurate. But based on the talk given by Phil Koopman at the Rochester Forth conference two years ago and by Rick Van Norman at the conference this year, the architecture of the RTX 4000 is significantly different from either the RTX 2000 or the SC32. A comparison of the two approaches to supporting Forth in hardware would certainly be an interesting study but I don't believe the outcome is at all a forgone conclusion. Perhaps Phil could enlighten us. Now some words about where and how the SC32 was developed and how that impacts chip availability for high volume usage. The SC32 was developed at the Johns Hopkins University Applied Physics Laboratory to support embedded computer applications. We are using the part on numerous projects including a satellite instrument controller. JHU/APL is over 45 years old and is one of the largest university run research labs in the country. We developed the chip using a silicon compiler and can easily recompile the design to over 20 different fab lines from a wide variety of silicon foundries with feature sizes as fine as 1 micron. A new release of this CAD software now in beta test supports submicron features. The current 10 MHz implementation of the SC32 was built with a 2 micron process. I have already recompiled the SC32 design targeted to a 1 micron technology and the part's speed should more than double. What does this all mean about availability of the SC32 to Silicon Composers customers? First off, the fab house that was used for the current version of the SC32 uses a direct e-beam write on wafer process that is ideal for low volume production. If high volume production becomes necessary then the current fab house can easily transfer the chip design to an traditional high volume foundry that has an identical process. Of course we could also retarget the design using our silicon compiler for different high volume line. So clearly the SC32 could easily go into very high volume production if such demands arise. In the mean time, low volume application can still be cost effectively satisfied through the current arrangement. In order to make the SC32 widely available, APL has granted an exclusive license to Silicon Composers for commercial applications. Silicon Composers currently obtains their chips from the same foundry APL used. Silicon Composers also performs testing and burn-in on the chips and they have developed an IBM-PC SC32 co-processor card with support software. My understanding is that Silicon Composers is willing to sublicense to the design to high volume customers. Such a sublicense will allow those high volume users to directly negotiate costs (potentially with all the foundries supported by the silicon compiler). This competition has the potential to greatly lower costs for the high volume user when compared to buying chips from a single source (for example Intel 80386 or Harris RTX2000). Finally, remember that this is an APL design. Should Silicon Composers not make it (hevean forbid -) rights to the chip can still be obtained from APL. In short, I feel the high volume user (and probably even the middle volume customer) can feel confident that they can get the SC32 for a long time to come. For low volume users (like me for example), the biggest advantage of the SC32 is that it availble NOW!! Call Silicon Composers and you too can have one on your desk within a week. Marty Fraeman mef@aplcen.apl.jhu.edu 301-953-5000, x8360 JHU/Applied Physics Laboratory Johns Hopkins Road Laurel, Md. 20707
koopman@a.gp.cs.cmu.edu (Philip Koopman) (11/16/89)
In article <3891@aplcen.apl.jhu.edu>, mef@aplcen.apl.jhu.edu (Marty Fraeman) writes: > From: mef@aplcen.apl.jhu.edu (Marty Fraeman) > Subject: SC32 performance and availability > Estimating the performance of chips under development is > of course not likely to be very accurate. But based on > the talk given by Phil Koopman at the Rochester Forth > conference two years ago and by Rick Van Norman at the > conference this year, the architecture of the RTX 4000 > is significantly different from either the RTX 2000 or the > SC32. A comparison of the two approaches to supporting > Forth in hardware would certainly be an interesting study > but I don't believe the outcome is at all a forgone conclusion. > Perhaps Phil could enlighten us. A comparison between the RTX 32P and the RTX 2000 showed that they took about the same number of clock cycles to execute a mix of Forth instructions. The minimum time to execute an instruction on the RTX 32P was 2 clocks, but it made up for this by supporting higher-level instructions (e.g. ROT and 2OVER ) and by combining subroutine calls with opcodes "for free". The RTX 4000 takes fewer clock cycles for the average instruction than the RTX 32P. It would be premature to claim that this does more than equal the increase in power of the SC32 over the RTX 2000. So, I agree that the jury is still out. BUT, number of clock cycles is not the entire issue. For embedded real time control, memory chip speed is usually a consideration because of a combination of cost, power/cooling, and size concerns. The 32-bit RTX series uses 2 clocks per memory cycle instead of 1 clock per memory cycle. That means that if the limiting factor in your system is memory chip speed, you get at least twice the clock frequency with the 32-bit RTX family than with the RTX 2000 or SC32. The RTX 4000 is being optimized for total system solution effectiveness, *not* raw speed at any cost. Phil Koopman koopman@greyhound.ece.cmu.edu Arpanet 2525A Wexford Run Rd. Wexford, PA 15090 Senior Scientist at Harris Semiconductor. I don't speak for them, and they don't speak for me.