[comp.arch] Mass produced custom chips

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/12/91)

  Another posting got me thinking about this, but I'll start a separate
thread, since it's somewhat off topic.

  Will the day ever come when we can fast build custom CPUs?

  By that I mean the customer will be able to order a CPU built with
certain instructions in hard code, perhaps some in microcode, designed
to run an o/s which will emulate the rest.

  Obviously this would have an upper bound so you couldn't have ALL
features, but consider trading a few register windows for another 8k
cache, or giving up some parallelism to get hardware divide.

  What this requires is a set of capabilities which I believe could be
available in the next decade.

  - fully automated chip layout. Doesn't have to be optimal, fully
    functional and nominal would do.

  - A program the customer could run on a PC or workstation to select
    the options, and then send then in by email, floppy, or whatever.
    (I think this capability is possible today)

  - direct computer controlled chip generation without a mask. If you
    don't have this to keep cost down the idea is too expensive to do.

  Okay, now everyone tell me what technology I missed. Remember that all
of these CPUs would still run the same software, so there is no need to
generate custom anything but silicon, or whatever we are using by the
time the rest of this could be done.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

miklg@sono.uucp (Michael Goldman ) (04/15/91)

Bill Davidsen suggested a custom CPU as a chip for the future.

 I read about, and got some literature, on a chip from Phillips
which is a programmable gate array with a programming time on the
order of ~1 ms.  Their idea is that people would put their code
into boolean, and *swap* it in and out of the gate array with
the process - e.g., it could be TCP/IP code one time slice and
X.25 code the next.  It's still fairly new, and the idea seems
exciting, but possibly ahead of its time.  It would be nice if
it really caught on, and we could get away from this concept of
a single CPU with registers, and simply have our algorithms re-
designed by the compiler to take advantage of the generality and
parallelism inherent in general logic.  We could have a number of
different architectures for the gate arrays, with our compiler trans-
lating our code into Boolean, with the manufacturer's utility 
translating that into the particular architecture of their gate
array.  There are a lot of applications that would benfit from
parallelism - matrix multiplication, searching, numerical integration,
etc.

I'm afraid without a certain momentum from a number of big users it
won't catch on, but maybe !? 

henry@zoo.toronto.edu (Henry Spencer) (04/16/91)

In article <1991Apr15.154955.2452@sono.uucp> miklg@sono.uucp (Michael Goldman ) writes:
> I read about, and got some literature, on a chip from Phillips
>which is a programmable gate array with a programming time on the
>order of ~1 ms.  Their idea is that people would put their code
>into boolean, and *swap* it in and out of the gate array...
>I'm afraid without a certain momentum from a number of big users it
>won't catch on, but maybe !? 

I don't think it's going to be popular unless Philips is willing to publish
complete programming specifications, so you can generate programs for the
array without using proprietary software.  So far, the programmable-logic
manufacturers as a class get a grade of F- for their willingness to tell
mere mortals how to program the chips.  (I.e., they won't.)  Turkeys.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

cs191049@cs.brown.edu (Aaron Smith) (04/17/91)

The ee department at Brown University (LEMS lab) is working on a reconfigurable
machine based on programmable gate arrays.  Right now, our major obsticle is 
the reconfigure time.  With a reconfigure time of 1ms, several gate arrays 
could be combined to produce a different configuration on every other clock
cycle or somewhere close to that.  I think reconfigurability holds great
promise.  Out current weapon of choice is the Xlinx line.

Aaron Smith 
Graduate Student
ats@lems.brown.edu

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (04/22/91)

In article <3329@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com 
	(bill davidsen) writes:
>Will the day ever come when we can fast build custom CPUs?
>  - fully automated chip layout.
>  - A program the customer could run on a PC or workstation to select
>    the options, and then send then in by email, floppy, or whatever.
>  - direct computer controlled chip generation without a mask.

There was a project a few years ago (U Arizona??) with the motto "Ada
to Silicon". I believe they claimed some success. So, the idea of
highly automated chip layout has been around a few times.

On another tack, see "Building and Using a Highly Parallel
Programmable Logic Array", IEEE Computer, Jan 1991 p.81. This is the
Splash system: software, plus a VME board containing 32 Xilinx FPGAs,
each with RAM. The software isn't as high level as you wish, but
there is a class of problems where they routinely run an order of
magnitude faster than a Cray-2 or a CM-2.

On the fabrication side, the trend has been to what you might call
megafabs, with long construction times and nine digit price tags. I
have been listening for years for hints that a minifab or microfab
could be built, and things are looking up. The military tends to need
very small production runs, so they have pushed for flexible
multipurpose equipment - for instance, machines that can do several
steps to a set of wafers before being reloaded. Add in ideas like
"clean boxes", the rise of fab-equipment interface standards, FIBs
(Focussed Ion Beams), laser and E-beam direct-write technologies,
solid-state laser/plasma x-ray sources, x-ray mirrors, etc and it
seems guaranteed that fab technology will evolve. Will the evolution
allow microfabs? Gee, I don't know. I have this dream of a truck
pulling up to the EE department's loading dock and leaving a
0.1-micron facility .. but just because an x-ray source will be small
(take that, IBM!) does not imply that the whole fab will be a small
enough number of units, and state-of-the-art, too.

To end on an enthusiastic note: I just saw a wonderful photo.  It
showed a corner of a 68040, before-and-after they used FIB to cut two
traces (!!) and run a patch wire (!!!!!!). The claim was that they
had designed unconnected bits of logic into odd places around the
chip, so that they could, say, cut out an inverter, and patch in a
nand in its place. Wow. In fact, gosh golly wow.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Robotics Institute

rod@isi.edu (Rodney Doyle Van Meter III) (04/23/91)

Some of you are probably already familiar with MOSIS, but since the
discussion seems to be headed that way, I'll put in a plug for us. If
you're hooked in to MOSIS, you can design a chip (usually in scalable
CMOS design rules which we can provide you a copy of), submit it to be
fabbed (preferably by email), and in usually 8-12 weeks (depending on
which process, glitches we hit, etc.) you get back some number of
copies of your chip.

If you're at a university with VLSI design classes, the odds are good
you're already set up with us. Commercial people can get in, too. Call
(213)822-1511 and ask for MOSIS. Tell whoever answers that you want to
find out about fabbing through us -- they should know enough to get
you started on the paperwork.

Price? I don't know, since I'm not connected with production stuff (I
do unrelated programming & have never been familiar with any of that),
but I think the bottom is around $500 for four copies of a tiny chip,
which, I seem to recall somebody telling me, is good for about 10K
gates, depending on your design style. That's in 2 micron CMOS. We
also support 1.6, and maybe 1.2. You can submit designs of virtually
any size and request virtually any number of parts, but it'll cost you
more.

As for "automatic" design, my boss wrote a book called _VLSI: Silicon
Compilation and the Art of Automatic Microchip Design_, Ron Ayres,
Prentice-Hall, 1983. He can take logic equations and generate layout
(though I think it's pretty inefficient, it does work). This is
separate from MOSIS. There is (or used to be) a company called Silicon
Compilers which used his stuff. There are also techniques for taking
logic circuit designs and producing layout, but I'm completely
unfamiliar with them. I have no doubt that this area is being explored
in the research community, though I haven't a clue where you'd find
out about it.

That's virtually everything I know about both topics, but I could
probably refer you to people who know more if yo'ure interested.

			--Rod

manley@optilink.UUCP (Dave Manley) (04/23/91)

From article <12742@pt.cs.cmu.edu>, by lindsay@gandalf.cs.cmu.edu (Donald Lindsay):
> In article <3329@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com 
> 	(bill davidsen) writes:
>>Will the day ever come when we can fast build custom CPUs?
>>  - fully automated chip layout.
>>  - A program the customer could run on a PC or workstation to select
>>    the options, and then send then in by email, floppy, or whatever.
>>  - direct computer controlled chip generation without a mask.
> 
> On the fabrication side, the trend has been to what you might call
> megafabs, with long construction times and nine digit price tags. I
> have been listening for years for hints that a minifab or microfab
> could be built, and things are looking up. 

Someone (I could try to find the reference) does sell a gate array 'microfab'.
I believe it is a 2u CMOS process.  Physically I think it (the fab not the array)
is about 20 feet on a side. I don't remember how large the array sizes were.
I think it is priced ~1M.

If fast is four weeks United Silicon Structures (no I don't work for them)
advertises 1-2u CMOS full custom, no minimum quantity.

Now, maybe your question should be: Will the day ever come when we can cheaply,
fast build custom CPUs?

mash@mips.com (John Mashey) (04/23/91)

In article <6286@optilink.UUCP> manley@optilink.UUCP (Dave Manley) writes:

>Now, maybe your question should be: Will the day ever come when we can cheaply,
>fast build custom CPUs?

If only it were so easy ....
One also needs to:
	a) Cheaply and quickly generate the corresponding set of of
	diagnostics for both design verification and production.
	I especially want to see the ones for custom new designs with
	multi-processor, 2-level cache coherency... generated quickly...
and worse:
	b) Cheaply and quickly generate the corresponding set of
	compilers, debuggers, libraries....

Now, there has been progress in both of those areas, so it's hardly
hopeless..... but not easy :-)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650

colwell@pdx023.pdx023 (Robert Colwell) (04/23/91)

In article <2548@spim.mips.COM> mash@mips.com (John Mashey) writes:
   In article <6286@optilink.UUCP> manley@optilink.UUCP (Dave Manley) writes:

   >Now, maybe your question should be: Will the day ever come when we can cheaply,
   >fast build custom CPUs?

   If only it were so easy ....
   One also needs to:
	   a) Cheaply and quickly generate the corresponding set of of
	   diagnostics for both design verification and production.
	   I especially want to see the ones for custom new designs with
	   multi-processor, 2-level cache coherency... generated quickly...
   and worse:
	   b) Cheaply and quickly generate the corresponding set of
	   compilers, debuggers, libraries....

   Now, there has been progress in both of those areas, so it's hardly
   hopeless..... but not easy :-)

Said the spider to the fly...the micro guys have ruined the CPU design game for
most folks, IMHO (yeah, I know, I'm one of them now.)

There's so very much more to this than the cost of designing and fabbing your
first working production parts.  You need to design the system, too.  You need
software, including OS, compilers, assemblers, debuggers, linkers, & profiling
tools.  You need a sales force that understands your product and can sell to
customers.  You need a marketing organization that knows where the customers
hide and how to reach them.  You need a benchmarking crew, because nobody's
technology is so much better than everyone else's that they can live with
off-the-shelf performance across the board.  You need field service.  And you
need a story as to why somebody should take a chance on your system or processor
instead of going with a sure bet by somebody bigger.

I believe that the only hope for future garage-shop hardware designers is to get
faster & much cheaper fabs, but also to get faster & much cheaper logic
synthesis and simulation tools.  Ultimately, I believe it's hopeless to try to
design "custom CPUs"; if you do manage to overcome the big guys' economies of
scale and captive process technology, and you also manage to get to market
quicker than they do, and you achieve all of the things mentioned above, you
still need a significant performance edge (or some other value-added.)  Good
luck with that, too.  Personally, I believe the day has already come and gone,
just as it has in the auto industry.

There's an auto museum near Cape Cod, Mass., with a display near the front door
of the logos of all the car companies that existed from 1910 through the present
day.  It's quite sobering to see how many there once were compared to how many
survive today.  Imagine what it would take to start up a new one nowadays.

Bob Colwell  colwell@ichips.intel.com  503-696-4550
Intel Corp.  JF1-19
5200 NE Elam Young Parkway
Hillsboro, Oregon 97124

bhoughto@pima.intel.com (Blair P. Houghton) (04/24/91)

In article <COLWELL.91Apr23090526@pdx023.pdx023> colwell@pdx023.pdx023 (Robert Colwell) writes:
>There's an auto museum near Cape Cod, Mass., with a
>display near the front door of the logos of all the car
>companies that existed from 1910 through the present day.
>It's quite sobering to see how many there once were
>compared to how many survive today.  Imagine what it would
>take to start up a new one nowadays.

Ca. 1926 there were over 350 auto manufacturing companies in
the USA alone.  Now there are 3.  (If you count Saturn as 4,
you aren't paying attention.)

				--Blair
				  "It's trivial, it's irrelevant,
				   it's the only thing you'll
				   remember from today's news...
				   Welcome to Usenet."

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (04/25/91)

In article <2548@spim.mips.COM> mash@mips.com (John Mashey) writes:
>In article <6286@optilink.UUCP> manley@optilink.UUCP (Dave Manley) writes:
>>Now, maybe your question should be: Will the day ever come when we can
>>cheaply, fast build custom CPUs?
>One also needs to:
>	a) Cheaply and quickly generate the corresponding set of of
>	diagnostics for both design verification and production.
>	b) Cheaply and quickly generate the corresponding set of
>	compilers, debuggers, libraries....

In article <COLWELL.91Apr23090526@pdx023.pdx023> colwell@pdx023.pdx023 
	(Robert Colwell) writes:
  [stuff I agree with, reinforcing the above]

What you say is true, but not relevant to most of the published 
daydreaming about instant custom chips.

Usually, the suggestion is that the chip will fit a special niche -
such as a radar autocorrelator chip or a pattern matcher chip - or
will be a coprocessor (in some loose sense of the word). The
general-purpose market is to be avoided, not only for the good
reasons which you gave, but also because it's increasingly hard to
find big wins there. In a niche, it may be possible to get an
enormous win: the Splash board is sometimes 200 times faster than a
16K-PE CM-2.

In particular, most daydreams have been about casting a single
specific algorithm to hardware. If a chemical-bonding problem is
going to take days to grind, why not make an overnight chip, that has
parallel execution units, one for each aspect of that particular
molecule?  And (more down to earth, or anyway MOSIS) why shouldn't an
encryption chip have a 500-bit-wide ALU?

In the science cases, the assumption is that the hardware
verification will be somewhat application-specific, too. The user
would be expected to have some test cases, and perhaps include e.g.
checks to see if a physically conserved property (angular momentum?)
has actually been conserved.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Robotics Institute

buschman@tubsibr.uucp (Andreas Buschmann) (04/26/91)

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:

>            And (more down to earth, or anyway MOSIS) why shouldn't an
>encryption chip have a 500-bit-wide ALU?

Some years ago there was an RSA encryption chip constructed here as a
project, which used a 900-bit-wide adder. It was full custom.
I don't know if there were build more then som example chips, or even
if it is available somewhere. The rights are at Siemens now, at leas I
think so, but I havn't heard of it again.


 /|)			Andreas	Buschmann
/-|)			TU Braunschweig, Germany (West)	
						  ^^^^  was

bitnet: buschman%tubsibr@dbsinf6.bitnet
uucp:   buschman@tubsibr.uucp

rph@cs.brown.edu (Richard Hughey) (04/26/91)

In article <12785@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:
>Usually, the suggestion is that the chip will fit a special niche -
>such as a radar autocorrelator chip or a pattern matcher chip - or
>will be a coprocessor (in some loose sense of the word). The
>general-purpose market is to be avoided, not only for the good
>reasons which you gave, but also because it's increasingly hard to
>find big wins there. In a niche, it may be possible to get an
>enormous win: the Splash board is sometimes 200 times faster than a
>16K-PE CM-2.
>
>Don		D.C.Lindsay 	Carnegie Mellon Robotics Institute

Comparing co-processors against the Connection Machine isn't
exactly the way to go - the CM-2 can be regarded as a massively
parallel (and massively COSTLY) general-purpose co-processor, a
great contrast slightly- or non-parallel supercomputers.  Splash'
main advantage over the CM-2 is its cost - the performance on the
sequence comparison example is more realistically 10 times slower
than the Splash board on 100x100 sequence comparison (the version
mentioned in Computer is for distributed sequence comparison,
using 100 of the 16K PEs - 100x100 (or, eqv., 100x128) comparison
can be done 0.17 seconds, in comparison to Splash' 0.020 seconds
[CM-2 performance could be further increaded by a factor of 4 or
more by using minimum-size words, leading to a somewhat more
complicated program.]).

Where Splash does win (vs CM-2) is on size (cost) and its ability
to prototype hardware designs before fabrication - programming
can be slow (the seq. comp. program has many many many lines of
code) but is much faster than designing and fabricating a new
system, which when up and running might not be the perfect
solution to a problem.

As part of my thesis, I've implemented a programmable linear
systolic array, designed specifically for combinatorial
applications (sequence comparison prime among them).  The system
(The Brown Systolic Array, or B-SYS) has traditional SIMD
programming with very efficient systolic communication.  Sequence
comparison variations run 5-40 lines of B-SYS code per cell
program, though some systolic programming issues I'm looking at
make this much easier.  There's a running 10-chip (470-processor)
prototype system that does simple seq. comparison about 1/20 the
speed of Splash, so slow because each instruction execution
requires 3 I/O writes over an ISA bus (ugh!).  A full
implementation (32 chips (1504 PEs) on a single board w/ local
instruction sequencer) could perform 3-5 8-bit GOPS (2x faster
than Splash).  A redesign of the chip to 0.8 micron CMOS could
increase PE density (and performance) by a factor of 10.

There's a paper upcoming in ICPP '91 on this, which I can send
preprints of to anyone interested.  Also, the tech report version
of my thesis should be out in a couple of months.


      - Richard



--------------------------------------- Richard Hughey              
INTERNET:  rph@cs.brown.edu	        Brown University            
BITNET:    rph@browncs		        Box 1910                    
(decvax, allegra, ...)!brunix!rph       Providence, RI 02912        

chased@rbbb.Eng.Sun.COM (David Chase) (04/26/91)

In article <12785@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:
>In particular, most daydreams have been about casting a single
>specific algorithm to hardware. If a chemical-bonding problem is
>going to take days to grind, why not make an overnight chip, that has
>parallel execution units, one for each aspect of that particular
>molecule?  And (more down to earth, or anyway MOSIS) why shouldn't an
>encryption chip have a 500-bit-wide ALU?

There's also a middle ground -- (speaking of 500-bit wide ALUs) check
out Computer Architecture News of March 1991, "Hardware Speedups in
Long Integer Multiplication", by Shand, Bertin, Vuillemin.  They used
"Programmable Active Memory" to implement (among other things) a 32 by
512 bit multiplier, and 200Kbit/sec RSA en/decryption (512-bit keys).

A PAM consists of a 5 by 5 array of LCA (Xilinx PGA data book) chips,
plus 4 megabits of static RAM.  Several of these were used to
implement fast RSA.  One point worth noting is that (as I understand
it) the PAM is reconfigured for each key -- the (automatic)
"compilation" to do this takes about 30 minutes (downloading to PAM is
much faster).

Note the benefits -- extreme customization allows high performance,
programmability allows turnaround in under one hour, and you can do
something else with the hardware when you are done with that problem.

Read the paper.  It's very interesting.

David Chase
Sun

guy@auspex.auspex.com (Guy Harris) (04/26/91)

>>It's quite sobering to see how many there once were
>>compared to how many survive today.  Imagine what it would
>>take to start up a new one nowadays.
>
>Ca. 1926 there were over 350 auto manufacturing companies in
>the USA alone.  Now there are 3.  (If you count Saturn as 4,
>you aren't paying attention.)

There are, however, some start-ups in the automotive industry; the ones
I know about, however, are all building supercars.  (If you count Acura,
Lexus, or Infiniti as startups, you aren't paying attention. :-))  How
many will survive, I dunno.  There may be an analogy to be drawn
here; cf. Robert Colwell's previous employer....

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/29/91)

At the time there were hundreds of companies, they were producing a few
hundred cars a year, at a high price, mostly by hand. And there are
still a number of companies which build small runs, since there's a
cutoff in the EPA regs at something like 100 or 200 unit/year. Smaller
firms have less stringent regulation.

If I hit the lottery one of my cars will be an Avanti...
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

peter@ficc.ferranti.com (peter da silva) (04/30/91)

In article <3394@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
> And there are still a number of companies which build small runs...

> If I hit the lottery one of my cars will be an Avanti...

You mean a Corvette or Trans Am with a fiberglass shell based on an
old Studebaker?

There are plenty of companies that do the same thing in the computer
world today. They're called VARs.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

zaphod@madnix.UUCP (Ron Bean) (05/03/91)

In Article <B0.A_07@xds13.ferranti.com>, peter@ficc.ferranti.com
(peter da silva) writes:
 
>> And there are still a number of companies which build small runs...
>
>> If I hit the lottery one of my cars will be an Avanti...
>
>You mean a Corvette or Trans Am with a fiberglass shell based on an
>old Studebaker?                                         ^^^^^^^^
 
   Try "built with Studebaker's original tooling". And that
includes the chassis; only the engine and transmission come from
Chevrolet. They've updated the design a bit in recent years, so I
don't know how much of the old tooling remains, but they still
build it from the ground up (Avantis have always had fiberglass
bodies).
 
   A few years ago it was said that there were enough companies
making replacement parts for Ford Model-A's that you could build
a brand-new one, with no original parts (I suppose the same could
be said of the IBM-PC :-). Perhaps the most likely motivation for
a small-run custom CPU would be for nostalgic reasons (ie, PDP-10
on-a-chip). That way, you don't need a rational reason to do it.
 
==================
zaphod@madnix.UUCP (Ron Bean)
{harvard|rutgers|ucbvax}!uwvax!astroatc!nicmad!madnix!zaphod

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (05/06/91)

In article <1835@madnix.UUCP> zaphod@madnix.UUCP (Ron Bean) writes:

|                            Perhaps the most likely motivation for
| a small-run custom CPU would be for nostalgic reasons (ie, PDP-10
| on-a-chip). That way, you don't need a rational reason to do it.

  I think there would be a nice market for someone building a
DPS-whatever (MULTICS engine) even today. MULTICS could become the PC
operating system of choice for the next millenium.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"